AMD 5th Gen EPYC Turin debuts; ~2x LLM inference throughput vs Genoa with ZenDNN and torch.compile
AI Impact Summary
AMD's 5th Gen EPYC Turin (Zen5) introduces a high-core-count platform optimized for AI inference, enabling up to 192 cores and 384 threads. In benchmarks against Genoa, using ZenDNN 5.0 and the Zentorch PyTorch plugin with torch.compile, Turin achieves roughly a 2x increase in decode throughput for Meta Llama 3.1 8B Instruct under multi-instance configurations, validating strong Hugging Face ecosystem support on this CPU generation. This suggests tangible gains in latency and throughput for LLM workloads, with an upcoming optimized Dockerfile and benchmarking tooling to accelerate adoption for production deployments.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info