InfoCapability

SmolLM3: 3B multilingual long-context reasoner with 128k context

AI Impact Summary

SmolLM3 releases a fully open 3B model optimized for multilingual long-context reasoning with 128k context. It employs architectural innovations (Grouped Query Attention, NoPE, intra-document masking) and a three-stage pretraining plus mid-training pipeline, trained on 11.2T tokens across 24 days on 384 H100 GPUs. In benchmarks it outperforms Llama-3.2-3B and Qwen2.5-3B and remains competitive with larger 4B models like Qwen3 and Gemma3, signaling strong compute efficiency for long-document and reasoning workloads. The release includes the complete training blueprint, enabling teams to reproduce or adapt the methodology, which could influence model procurement and MLOps strategies for multilingual, long-context applications.

Affected Systems

SmolLM3Llama-3.2-3B

Date: Date not specified
Change type: capability
Severity: info

SmolLM3: 3B multilingual long-context reasoner with 128k context

More from Hugging Face

Get alerts for Hugging Face