InfoCapability

Run SmolVLM on Intel CPUs using OpenVINO and Optimum Intel in 3 steps

AI Impact Summary

Intel's OpenVINO/Optimum-Intel workflow enables running SmolVLM2-256M-Video-Instruct on CPU-only hardware, converting the model to OpenVINO IR and applying 8-bit quantization. The guide includes concrete commands for exporting, quantizing (weight-only or static), and running inference, with benchmarks showing up to 12x TTFT reduction and 65x throughput gains when using OpenVINO 8-bit WOQ on Intel CPUs. Quantization can affect accuracy, so validation is required before production, especially for vision components and multi-image prompts. This enables offline, privacy-preserving VLM deployment on devices without GPUs, expanding edge-use cases.

Affected Systems

SmolVLM2-256M-Video-InstructHuggingFaceTB/SmolVLM2-256M-Video-Instruct

Date: Date not specified
Change type: capability
Severity: info

Run SmolVLM on Intel CPUs using OpenVINO and Optimum Intel in 3 steps

More from Hugging Face

Get alerts for Hugging Face