Run SmolVLM on Intel CPUs using OpenVINO and Optimum Intel in 3 steps
AI Impact Summary
Intel's OpenVINO/Optimum-Intel workflow enables running SmolVLM2-256M-Video-Instruct on CPU-only hardware, converting the model to OpenVINO IR and applying 8-bit quantization. The guide includes concrete commands for exporting, quantizing (weight-only or static), and running inference, with benchmarks showing up to 12x TTFT reduction and 65x throughput gains when using OpenVINO 8-bit WOQ on Intel CPUs. Quantization can affect accuracy, so validation is required before production, especially for vision components and multi-image prompts. This enables offline, privacy-preserving VLM deployment on devices without GPUs, expanding edge-use cases.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info