MediumCapability

Modular: Day Zero Launch: Gemma 4 performance on NVIDIA and AMD

AI Impact Summary

Google DeepMind’s Gemma 4 models are available for immediate deployment on Modular Cloud, leveraging NVIDIA and AMD hardware for optimized performance. Benchmarks show a 15% throughput increase compared to vLLM on NVIDIA B200 GPUs, highlighting the efficiency gains achieved through Modular’s MAX inference framework. This rapid deployment capability, combined with native multimodal support and a 256K context window, enables developers to quickly scale demanding applications.

Affected Systems

Gemma 4Modular Cloud

Date: Date not specified
Change type: capability
Severity: medium

Modular: Day Zero Launch: Gemma 4 performance on NVIDIA and AMD

More from Modular MAX

Get alerts for Modular MAX