SmolLM3: 3B multilingual long-context reasoner with 128k context
AI Impact Summary
SmolLM3 releases a fully open 3B model optimized for multilingual long-context reasoning with 128k context. It employs architectural innovations (Grouped Query Attention, NoPE, intra-document masking) and a three-stage pretraining plus mid-training pipeline, trained on 11.2T tokens across 24 days on 384 H100 GPUs. In benchmarks it outperforms Llama-3.2-3B and Qwen2.5-3B and remains competitive with larger 4B models like Qwen3 and Gemma3, signaling strong compute efficiency for long-document and reasoning workloads. The release includes the complete training blueprint, enabling teams to reproduce or adapt the methodology, which could influence model procurement and MLOps strategies for multilingual, long-context applications.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info