InfoCapability

Hugging Face enables AMD Instinct MI300 support in Transformers and TGI on Azure

AI Impact Summary

Hugging Face has integrated first-class support for AMD Instinct MI300/MI300X hardware within its platform, enabling deployment of transformers-based models and TGI workflows with no code changes. The push includes CI/CD and Kubernetes-based scheduling, validated against Azure ND MI300x V5 and MI250, and leverages components like Transformers, text-generation-inference, PyTorch ROCm, and TunableOp to realize improved performance. In practice, users running Llama 3 70B and related workloads on Azure MI300X can expect notable latency and throughput gains (2x-3x faster time-to-first-token and 2x faster autoregressive decoding) driven by optimized GEMM kernels and memory capacity, with single-device deployment becoming feasible thanks to 192 GB HBM3. This signals stronger production portability for Hugging Face pipelines across AMD accelerators and paves the way for broader hardware-algorithm co-design benefits for fine-tuning and inference.

Affected Systems

AMD Instinct MI300

Date: Date not specified
Change type: capability
Severity: info

Hugging Face enables AMD Instinct MI300 support in Transformers and TGI on Azure

More from Hugging Face

Get alerts for Hugging Face