SmolVLM introduces 256M and 500M models—smaller VLMs for edge and browser inference
AI Impact Summary
SmolVLM expands the family with 256M and 500M parameter models, including two base and two instruction-tuned checkpoints, optimized for efficient multimodal tasks. They switch to smaller vision encoders (SigLIP base patch-16/512 at 93M) while allowing a larger input resolution to preserve performance per parameter. Training updates emphasize document understanding through Cauldron, Docmatix, and MathWriting data mixtures, improving DocVQA and related capabilities at a fraction of the footprint. The release also provides ONNX checkpoints and demos, enabling straightforward integration into existing inference pipelines via transformers and MLX, as well as WebGPU demos for browser-like environments.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info