InfoCapability

Google EmbeddingGemma: 308M-parameter multilingual on-device embedding model with 2k context

AI Impact Summary

EmbeddingGemma is Google's compact multilingual embedding model (308M parameters) optimized for on-device inference with a 2k context window, enabling faster retrieval and retrieval-augmented workflows on mobile devices. It uses a Gemma3-based encoder with bidirectional attention, producing 768-dimensional embeddings that can be truncated to 512, 256, or 128 as needed, and supports 100+ languages, expanding on-device multilingual search and fine-tuning potential (e.g., domain-specific corpora like MIRIAD). It is designed to plug into popular toolchains (Sentence Transformers, LangChain, LlamaIndex, Haystack, txtai, Transformers.js, TEI, ONNX), which broadens adoption without API calls. The move to an on-device embedding model can reduce latency, lower bandwidth usage, and improve privacy for multilingual RAG, semantic search, and code-search use cases.

Affected Systems

EmbeddingGemma

Date: Date not specified
Change type: capability
Severity: info

Google EmbeddingGemma: 308M-parameter multilingual on-device embedding model with 2k context

More from Hugging Face

Get alerts for Hugging Face