InfoCapability

AMD 5th Gen EPYC Turin debuts; ~2x LLM inference throughput vs Genoa with ZenDNN and torch.compile

AI Impact Summary

AMD's 5th Gen EPYC Turin (Zen5) introduces a high-core-count platform optimized for AI inference, enabling up to 192 cores and 384 threads. In benchmarks against Genoa, using ZenDNN 5.0 and the Zentorch PyTorch plugin with torch.compile, Turin achieves roughly a 2x increase in decode throughput for Meta Llama 3.1 8B Instruct under multi-instance configurations, validating strong Hugging Face ecosystem support on this CPU generation. This suggests tangible gains in latency and throughput for LLM workloads, with an upcoming optimized Dockerfile and benchmarking tooling to accelerate adoption for production deployments.

Affected Systems

AMD EPYC TurinAMD EPYC Genoa

Date: Date not specified
Change type: capability
Severity: info

AMD 5th Gen EPYC Turin debuts; ~2x LLM inference throughput vs Genoa with ZenDNN and torch.compile

More from Hugging Face

Get alerts for Hugging Face