MediumCapability

Hugging Face Introduces SmolVLA: Efficient Vision-Language-Action Model

Action Required

Organizations can now leverage a powerful, open-source VLA model for robotics applications, potentially accelerating research and development efforts.

AI Impact Summary

SmolVLA is a new, open-source vision-language-action model designed for robotics, offering a compact 450M parameter model that outperforms larger models on simulation and real-world tasks. The model's key innovation is its asynchronous inference stack, enabling 30% faster response times and 2x task throughput. This release introduces a new training method and architecture, offering a more accessible and efficient approach to robotics research and development, particularly for those with limited hardware resources.

Affected Systems

SmolVLA

Date: 3 Jun 2025
Change type: capability
Severity: medium

Hugging Face Introduces SmolVLA: Efficient Vision-Language-Action Model

More from Hugging Face

Get alerts for Hugging Face