Hugging Face enables AMD Instinct MI300 support in Transformers and TGI on Azure ND MI300x V5
AI Impact Summary
Hugging Face now provides first-class integration of AMD Instinct MI300 GPUs into its platform, enabling deployment and benchmarking of large models via transformers and text-generation-inference without code changes. The rollout leverages Azure ND MI300x V5 and a Kubernetes-based CI/CD workflow to run hardware-specific pods, delivering practical performance gains: 8-10% TunableOp-based latency improvements for small sequences and 2x–3x reductions in prefill and autoregressive decoding latency for Llama 3 70B, with MI300X’s 192 GB memory enabling single-device loading and fine-tuning. This expands production capacity on AMD hardware and reduces per-model costs by enabling high-end models to run efficiently on MI300 devices with minimal migration work.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info