Accelerating 130,000+ Hugging Face Models with ONNX Runtime
AI Impact Summary
ONNX Runtime is significantly accelerating the performance of over 130,000 Hugging Face models, primarily those with ONNX support. This includes popular LLMs like GPT2 and BERT, offering potential latency improvements – as demonstrated with whisper-tiny achieving a 74.30% gain over PyTorch. The tight integration with Hugging Face ensures ongoing support for a growing number of model architectures, representing a key optimization path for deploying these models at scale.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info