InfoCapability

SmolLM3 3B model enables 128k-context multilingual reasoning

AI Impact Summary

SmolLM3 is a 3B decoder-style model optimized for long-context multilingual reasoning, incorporating GQA, NoPE, and intra-document masking to support up to 128k tokens at inference. It was trained on 11.2T tokens across web, math, and code data and includes a mid-training phase to boost reasoning capabilities, with a 384 H100 GPU, 24-day training run. The team also provides the full architectural blueprint and data recipe, enabling rapid replication and cost-efficient deployment for teams needing long-context capabilities in six languages.

Affected Systems

SmolLM3SmolLM2

Date: Date not specified
Change type: capability
Severity: info

SmolLM3 3B model enables 128k-context multilingual reasoning

More from Hugging Face

Get alerts for Hugging Face