Google EmbeddingGemma: 308M-parameter on-device multilingual embedding model with 2K context
AI Impact Summary
Google releases EmbeddingGemma, a 308M-parameter multilingual embedding model optimized for on-device use with a 2K context window. This enables fast, memory-efficient embeddings for retrieval-augmented generation, semantic search, and mobile/edge RAG pipelines. The model supports truncation via MRl to 512/256/128 dimensions and claims RAM usage under 200 MB when quantized, offering a tangible reduction in on-device resource needs. It integrates with popular tooling like Sentence Transformers, LangChain, LlamaIndex, Haystack, txtai, Transformers.js, Text Embeddings Inference, and ONNX, easing migration for teams using these stacks.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info