SmolLM3 3B model enables 128k-context multilingual reasoning
AI Impact Summary
SmolLM3 is a 3B decoder-style model optimized for long-context multilingual reasoning, incorporating GQA, NoPE, and intra-document masking to support up to 128k tokens at inference. It was trained on 11.2T tokens across web, math, and code data and includes a mid-training phase to boost reasoning capabilities, with a 384 H100 GPU, 24-day training run. The team also provides the full architectural blueprint and data recipe, enabling rapid replication and cost-efficient deployment for teams needing long-context capabilities in six languages.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info