InfoCapability

Optimum Intel adds OpenVINO Runtime support for Transformer inference on Intel CPUs

AI Impact Summary

Intel and Hugging Face have integrated OpenVINO into Optimum Intel, enabling Transformer inference on a wide range of Intel processors via the OpenVINO Runtime. The workflow supports post-training static quantization with OVQuantizer using a 300-sample calibration dataset, exporting to OpenVINO XML/BIN and running with OVModelForImageClassification, with initial support focused on encoder models (ViT) and planned encoder-decoder quantization in a future OpenVINO release. Reported results show memory usage dropping from 344MB to 90MB and latency improving from 98ms to 41ms per sample, though the first inference will incur a warmup overhead. Adoption steps include installing optimum[openvino,nncf], performing calibration, exporting the quantized model, and validating accuracy against the baseline; note encoder-decoder support is not yet enabled but expected to come in the next release.

Affected Systems

Optimum Intel

Date: Date not specified
Change type: capability
Severity: info

Optimum Intel adds OpenVINO Runtime support for Transformer inference on Intel CPUs

More from Hugging Face

Get alerts for Hugging Face