InfoCapability

Modular MAX 25.2: Multi-GPU H200/H100 Support & CUDA-Free Inference

AI Impact Summary

Modular MAX 25.2 introduces multi-GPU support for NVIDIA H100 and H200 hardware, enabling the deployment of large language models like Llama-3.3-70B-Instruct across multiple GPUs. This release significantly expands model support to over 500 preconfigured models and incorporates features like GPTQ quantization and optimized LLM serving techniques (batch scheduling, in-flight batching, copy-on-write KV blocks) to improve performance and reduce TCO. The slim Docker container (1.3GB compressed) further accelerates deployment, eliminating CUDA dependencies and offering a simplified GPU programming experience with Mojo.

Affected Systems

H100H200

Date: Date not specified
Change type: capability
Severity: info

Modular MAX 25.2: Multi-GPU H200/H100 Support & CUDA-Free Inference

More from Modular MAX

Get alerts for Modular MAX