InfoCapability

SmolVLM 2B Vision-Language Model enables edge deployments with low memory footprint

AI Impact Summary

SmolVLM releases a 2B Vision-Language Model with a strong emphasis on memory efficiency, enabling on-device inference and edge deployment. The model switches the language backbone to SmolLM2 1.7B and employs aggressive 9x visual-information compression via pixel shuffle (patches 384x384, inner 14x14 with a SigLIP backbone), achieving a minimum GPU RAM requirement around 5.02 GB. As an open-source Apache 2.0 project with ready-made variants (Base, Synthetic, Instruct) and open training pipelines, it lowers the barrier to local multimodal workloads and could shift some workloads away from cloud inference to client devices.

Affected Systems

SmolVLMSmolVLM-Base

Date: Date not specified
Change type: capability
Severity: info

SmolVLM 2B Vision-Language Model enables edge deployments with low memory footprint

More from Hugging Face

Get alerts for Hugging Face