InfoCapability

SmolVLM 2B Vision-Language Model released with Apache 2.0 license for on-device deployment

AI Impact Summary

SmolVLM introduces a compact 2B vision-language model family that emphasizes memory efficiency and edge deployability. By replacing larger backbones with SmolLM2 1.7B for the language component and a 384x384 patch-based vision encoder, it achieves low GPU memory usage and faster prefill/throughput compared to larger VLMs, making on-device interaction feasible. The open-source release (Apache 2.0) includes SmolVLM-Base, SmolVLM-Synthetic, and SmolVLM-Instruct, integrated with transformers, which lowers barrier to customization and on-premises experimentation but requires hardware planning and model selection for deployment scenarios.

Affected Systems

SmolVLMSmolVLM-Base

Date: Date not specified
Change type: capability
Severity: info

SmolVLM 2B Vision-Language Model released with Apache 2.0 license for on-device deployment

More from Hugging Face

Get alerts for Hugging Face