Engineered for efficiency
A 308M parameter model that can run on less than 200MB of RAM with quantization.
Explore EmbeddingGemma
EmbeddingGemma generates high-quality embeddings with reduced resource consumption, enabling on-device Retrieval Augmented Generation (RAG) pipelines, semantic search, and generative AI applications that can run on everyday devices.
A 308M parameter model that can run on less than 200MB of RAM with quantization.
Trained on over 100 languages, providing best-in-class text understanding for its size.
Leverages Matryoshka Representation Learning (MRL) for customizable embedding dimensions.