InfoCapability

Google EmbeddingGemma: 308M-parameter on-device multilingual embedding model with 2K context

AI Impact Summary

Google releases EmbeddingGemma, a 308M-parameter multilingual embedding model optimized for on-device use with a 2K context window. This enables fast, memory-efficient embeddings for retrieval-augmented generation, semantic search, and mobile/edge RAG pipelines. The model supports truncation via MRl to 512/256/128 dimensions and claims RAM usage under 200 MB when quantized, offering a tangible reduction in on-device resource needs. It integrates with popular tooling like Sentence Transformers, LangChain, LlamaIndex, Haystack, txtai, Transformers.js, Text Embeddings Inference, and ONNX, easing migration for teams using these stacks.

Affected Systems

EmbeddingGemmaGemma3

Date: Date not specified
Change type: capability
Severity: info

Google EmbeddingGemma: 308M-parameter on-device multilingual embedding model with 2K context

More from Hugging Face

Get alerts for Hugging Face