Gemma 4 Frontier multimodal models enable on-device inference with Apache 2 license
AI Impact Summary
Gemma 4 extends on-device multimodal inference to image, text, and audio with Apache 2 licensing and availability on Hugging Face, expanding edge AI capabilities. The family spans E2B, E4B, 31B dense, and 26B MoE configurations, featuring Per-Layer Embeddings and Shared KV Cache to optimize long-context, memory, and compute trade-offs. This enables private, low-latency inference at the edge with tooling compatibility across transformers, llama.cpp, MLX, WebGPU, and Rust, but it requires careful hardware planning and integration with on-device deployment pipelines when selecting model size.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium