SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data
AI Impact Summary
SmolVLA is an open-source, compact Vision-Language-Action model trained on the Lerobot community dataset, offering a significant opportunity for robotics research. The model’s key features – a 450M parameter size, asynchronous inference, and training on affordable hardware – democratize access to VLAs and accelerate research toward generalist robotic agents. The model’s architecture, combining a SmolVLM2 VLM with a flow-matching transformer action expert, demonstrates a novel approach to efficient and robust action prediction.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info