SmolVLM 2B Vision-Language Model enables edge deployments with low memory footprint
AI Impact Summary
SmolVLM releases a 2B Vision-Language Model with a strong emphasis on memory efficiency, enabling on-device inference and edge deployment. The model switches the language backbone to SmolLM2 1.7B and employs aggressive 9x visual-information compression via pixel shuffle (patches 384x384, inner 14x14 with a SigLIP backbone), achieving a minimum GPU RAM requirement around 5.02 GB. As an open-source Apache 2.0 project with ready-made variants (Base, Synthetic, Instruct) and open training pipelines, it lowers the barrier to local multimodal workloads and could shift some workloads away from cloud inference to client devices.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info