Meta Llama 4 Maverick & Scout released on Hugging Face Hub with MoE and multimodal support
AI Impact Summary
Meta releases Llama 4 Maverick (17B active, ~400B total) and Llama 4 Scout (17B active, ~109B total) as Mixture-of-Experts models with native multimodal support on the Hugging Face Hub, tightly integrated with transformers and Text Generation Inference (TGI) for scalable loading and inference. Scout targets single-GPU deployment via on-the-fly quantization (4-bit/8-bit), while Maverick offers BF16/FP8 paths and uses a 128-expert MoE configuration, with long-context capabilities up to 1M tokens for Maverick Instruct and 10M for Scout. Access is governed by the Llama 4 Community License Agreement and the Xet storage backend, which promises faster uploads/downloads and higher deduplication for derivatives. This release broadens production-ready multimodal inference options in Hugging Face workflows and may influence deployment choices based on hardware availability and licensing terms.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium