You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

AraGemma-Embedding-300m

image/png

Model Page: AraGemma-Embedding (Hugging Face)

Authors: Google DeepMind (base model), fine-tuned by Omartificial-Intelligence-Space

Find More About: Arabic Semantic Embeddings Models


Simple RAG and Other NLP Tasks Example:

RAG & NLP Tasks Notebook


Model Overview

AraGemma-Embedding-300m is a fine-tuned version of EmbeddingGemma-300M, optimized for Arabic semantic understanding.
This model was fine-tuned using 1 million Arabic triplet pairs (anchor, positive, negative) with Matryoshka Representation Learning (MRL) to enhance semantic similarity, clustering, classification, and retrieval for Arabic texts.

It builds on Google’s Gemma 3 research, making it lightweight, efficient, and deployable on-device (mobile, laptops, desktops) while achieving state-of-the-art Arabic semantic embedding performance.


Model Information

Input

  • Text string (Arabic or multilingual)
  • Maximum context length: 2048 tokens

Output

  • Dense vector representation of size 768
  • Supports MRL truncation to 512, 256, or 128 dimensions with re-normalization

Performance

Benchmark Results

image/png

Significant improvements show stronger semantic Arabic understanding.

Performance with other Arabic Embeddings

Model Dim # Params. STS17 STS22-v2 Average
Arabic-Triplet-Matryoshka-V2 768 135M 85 64 75
Arabert-all-nli-triplet-Matryoshka 768 135M 83 64 74
AraGemma-Embedding-300m 768 303M 84 62 73
GATE-AraBert-V1 767 135M 83 63 73
Marbert-all-nli-triplet-Matryoshka 768 163M 82 61 72
Arabic-labse-Matryoshka 768 471M 82 61 72
AraEuroBert-Small 768 210M 80 61 71
E5-all-nli-triplet-Matryoshka 384 278M 80 60 70
text-embedding-3-large 3072 - 81 59 70
Arabic-all-nli-triplet-Matryoshka 768 135M 82 54 68
AraEuroBert-Mid 1151 610M 83 53 68
paraphrase-multilingual-mpnet-base-v2 768 135M 79 55 67
AraEuroBert-Large 2304 2.1B 79 55 67
text-embedding-ada-002 1536 - 71 62 66
text-embedding-3-small 1536 - 72 57 65

Usage

This model is compatible with Sentence Transformers and Hugging Face Transformers.

from sentence_transformers import SentenceTransformer

# Load the Arabic-optimized embedding model
model = SentenceTransformer("Omartificial-Intelligence-Space/AraGemma-Embedding-300m")

# Example: Arabic semantic similarity
query = "ما هو الكوكب الأحمر؟"
documents = [
    "الزهرة تشبه الأرض في الحجم والقرب.",
    "المريخ يسمى بالكوكب الأحمر بسبب لونه المميز.",
    "المشتري أكبر كواكب المجموعة الشمسية.",
    "زحل يتميز بحلقاته الشهيرة."
]

query_embedding = model.encode(query)
doc_embeddings = model.encode(documents)

# Compute cosine similarities
from torch import cosine_similarity
import torch

query_tensor = torch.tensor(query_embedding)
doc_tensors = torch.tensor(doc_embeddings)
similarities = cosine_similarity(query_tensor.unsqueeze(0), doc_tensors)

print(similarities)

Applications

  • Semantic Chunking for RAG (Retrieval-Augmented Generation)
  • Semantic Search & Retrieval (Arabic focus)
  • Clustering and Classification of Arabic documents
  • Cross-lingual retrieval (multilingual data supported)

Limitations

  • Embedding activations do not support float16 – use float32 or bfloat16.

Citation

If you use this model in your work, please cite:

@misc{AraGemmaEmbedding2025,
  title={AraGemma-Embedding: Fine-tuned EmbeddingGemma for Arabic Semantic Understanding},
  author={Omartificial-Intelligence-Space},
  year={2025},
  url={https://huggingface.co/Omartificial-Intelligence-Space/AraGemma-Embedding-300m}
}
Downloads last month
6
Safetensors
Model size
303M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for Omartificial-Intelligence-Space/AraGemma-Embedding-300m

Finetuned
(26)
this model

Space using Omartificial-Intelligence-Space/AraGemma-Embedding-300m 1

Collection including Omartificial-Intelligence-Space/AraGemma-Embedding-300m