Optimum Intel adds OpenVINO Runtime support for Transformer inference on Intel CPUs
AI Impact Summary
Intel and Hugging Face have integrated OpenVINO into Optimum Intel, enabling Transformer inference on a wide range of Intel processors via the OpenVINO Runtime. The workflow supports post-training static quantization with OVQuantizer using a 300-sample calibration dataset, exporting to OpenVINO XML/BIN and running with OVModelForImageClassification, with initial support focused on encoder models (ViT) and planned encoder-decoder quantization in a future OpenVINO release. Reported results show memory usage dropping from 344MB to 90MB and latency improving from 98ms to 41ms per sample, though the first inference will incur a warmup overhead. Adoption steps include installing optimum[openvino,nncf], performing calibration, exporting the quantized model, and validating accuracy against the baseline; note encoder-decoder support is not yet enabled but expected to come in the next release.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info