Modular achieves SOTA on AMD MI355 in 14 Days — Matmul Optimization
AI Impact Summary
Modular achieved state-of-the-art performance on AMD MI355 hardware in just 14 days by leveraging a software stack designed for rapid AI hardware bringup. The key was optimizing matmul operations, specifically by adapting a 500-line kernel to take advantage of new MI355 features like FP32 to BF16 conversion and larger tensor-core tile sizes. This demonstrates the value of a portable, architecture-agnostic software foundation for accelerating hardware adoption.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info