Google EmbeddingGemma: 308M-parameter multilingual on-device embedding model with 2k context
AI Impact Summary
EmbeddingGemma is Google's compact multilingual embedding model (308M parameters) optimized for on-device inference with a 2k context window, enabling faster retrieval and retrieval-augmented workflows on mobile devices. It uses a Gemma3-based encoder with bidirectional attention, producing 768-dimensional embeddings that can be truncated to 512, 256, or 128 as needed, and supports 100+ languages, expanding on-device multilingual search and fine-tuning potential (e.g., domain-specific corpora like MIRIAD). It is designed to plug into popular toolchains (Sentence Transformers, LangChain, LlamaIndex, Haystack, txtai, Transformers.js, TEI, ONNX), which broadens adoption without API calls. The move to an on-device embedding model can reduce latency, lower bandwidth usage, and improve privacy for multilingual RAG, semantic search, and code-search use cases.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info