SmolVLM 2B Vision-Language Model released with Apache 2.0 license for on-device deployment
AI Impact Summary
SmolVLM introduces a compact 2B vision-language model family that emphasizes memory efficiency and edge deployability. By replacing larger backbones with SmolLM2 1.7B for the language component and a 384x384 patch-based vision encoder, it achieves low GPU memory usage and faster prefill/throughput compared to larger VLMs, making on-device interaction feasible. The open-source release (Apache 2.0) includes SmolVLM-Base, SmolVLM-Synthetic, and SmolVLM-Instruct, integrated with transformers, which lowers barrier to customization and on-premises experimentation but requires hardware planning and model selection for deployment scenarios.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info