Convert Hugging Face Transformers to ONNX via torch.onnx, transformers.onnx, and Optimum
AI Impact Summary
Hugging Face Transformers can be exported to ONNX via three approaches at different abstraction levels: a low-level path using torch.onnx, a mid-level path via transformers.onnx, and a high-level Optimum Inference path leveraging ORTModelForSequenceClassification. The guidance demonstrates end-to-end examples for a real model (distilbert-base-uncased-finetuned-sst-2-english) and notes dependencies and config details (input_names, dynamic_axes, opset) that affect portability and runtime behavior. This expands deployment options to ONNX Runtime, enabling cross-platform inference and potential latency improvements, but requires alignment on the chosen method and updates to build pipelines and dependencies. Teams should plan to adjust CI/CD to install optional extras (e.g., optimum[onnxruntime]), and to modify inference code to load either ONNX graphs via torch/transformers exporters or ORT-based models depending on the selected path.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info