InfoCapability

SmolVLM introduces 256M and 500M models—smaller VLMs for edge and browser inference

AI Impact Summary

SmolVLM expands the family with 256M and 500M parameter models, including two base and two instruction-tuned checkpoints, optimized for efficient multimodal tasks. They switch to smaller vision encoders (SigLIP base patch-16/512 at 93M) while allowing a larger input resolution to preserve performance per parameter. Training updates emphasize document understanding through Cauldron, Docmatix, and MathWriting data mixtures, improving DocVQA and related capabilities at a fraction of the footprint. The release also provides ONNX checkpoints and demos, enabling straightforward integration into existing inference pipelines via transformers and MLX, as well as WebGPU demos for browser-like environments.

Affected Systems

SmolVLM-256MSmolVLM-500M

Date: Date not specified
Change type: capability
Severity: info

SmolVLM introduces 256M and 500M models—smaller VLMs for edge and browser inference

More from Hugging Face

Get alerts for Hugging Face