InfoCapability

AMD MI300X Custom Kernels: Llama 3.1 405B FP8 Optimization

AI Impact Summary

The creation of custom kernels for the AMD MI300X GPU is a significant effort focused on optimizing inference performance, particularly for large language models like Llama 3.1 405B. This involves fine-tuning kernels for operations such as Fused residual connection, RMS norm, and FP8 conversion, alongside GEMM and SwiGLU activation, to achieve speedups when running VLLM. This work leverages the MI300X’s architecture, including compute units, thread blocks, and XCDs, to maximize throughput and efficiency.

Affected Systems

AMD MI300XVLLM

Date: Date not specified
Change type: capability
Severity: info

AMD MI300X Custom Kernels: Llama 3.1 405B FP8 Optimization

More from Hugging Face

Get alerts for Hugging Face