InfoCapability

OpenVINO GenAI with Optimum-Intel for edge deployment of Hugging Face Transformers

AI Impact Summary

The article describes a capability to optimize and deploy Hugging Face Transformers using Optimum-Intel and OpenVINO GenAI for edge/client scenarios. It covers exporting models to OpenVINO IR, applying weight-only quantization (INT8/INT4) via NNCF with AWQ, and deploying through OpenVINO GenAI's Python and C++ APIs (LLMPipeline), targeting Intel hardware. This enables lower latency and a smaller deployment footprint at the edge, but requires updating model export, quantization, and deployment pipelines to the OpenVINO GenAI workflow.

Affected Systems

Optimum-IntelOpenVINO GenAI

Date: Date not specified
Change type: capability
Severity: info

OpenVINO GenAI with Optimum-Intel for edge deployment of Hugging Face Transformers

More from Hugging Face

Get alerts for Hugging Face