Hugging Face: Transformers-based LLM optimization in production: 8/4-bit precision, Flash Attention, and MQAs | SignalBreak | SignalBreak