Sentence Similarity
sentence-transformers
Safetensors
English
modernbert
biencoder
text-classification
sentence-pair-classification
semantic-similarity
semantic-search
retrieval
reranking
Generated from Trainer
dataset_size:483820
loss:CachedMultipleNegativesSymmetricRankingLoss
Eval Results
text-embeddings-inference
Redis fine-tuned BiEncoder model for semantic caching on LangCache
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base on the LangCache Sentence Pairs (all) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for sentence pair similarity.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Alibaba-NLP/gte-modernbert-base
- Maximum Sequence Length: 100 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 100, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("redis/langcache-embed-v3")
# Run inference
sentences = [
'He was a close friend of Ángel Cabrera and is a cousin of golfer Tony Croatto .',
'He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto .',
'UWIRE also distributes its members content to professional media outlets , including Yahoo , CNN and CBS News .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[0.9922, 0.9922, 0.5352],
# [0.9922, 0.9961, 0.5391],
# [0.5352, 0.5391, 1.0000]], dtype=torch.bfloat16)
Evaluation
Metrics
Information Retrieval
- Dataset:
train - Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.558 |
| cosine_precision@1 | 0.558 |
| cosine_recall@1 | 0.536 |
| cosine_ndcg@10 | 0.7524 |
| cosine_mrr@1 | 0.558 |
| cosine_map@100 | 0.6976 |
Training Details
Training Dataset
LangCache Sentence Pairs (all)
- Dataset: LangCache Sentence Pairs (all)
- Size: 26,850 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 8 tokens
- mean: 27.35 tokens
- max: 53 tokens
- min: 8 tokens
- mean: 27.27 tokens
- max: 52 tokens
- 1: 100.00%
- Samples:
sentence1 sentence2 label The newer Punts are still very much in existence today and race in the same fleets as the older boats .The newer punts are still very much in existence today and run in the same fleets as the older boats .1After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .1The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .1 - Loss:
CachedMultipleNegativesSymmetricRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 64, "gather_across_devices": false }
Evaluation Dataset
LangCache Sentence Pairs (all)
- Dataset: LangCache Sentence Pairs (all)
- Size: 26,850 evaluation samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 8 tokens
- mean: 27.35 tokens
- max: 53 tokens
- min: 8 tokens
- mean: 27.27 tokens
- max: 52 tokens
- 1: 100.00%
- Samples:
sentence1 sentence2 label The newer Punts are still very much in existence today and race in the same fleets as the older boats .The newer punts are still very much in existence today and run in the same fleets as the older boats .1After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .1The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .1 - Loss:
CachedMultipleNegativesSymmetricRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 64, "gather_across_devices": false }
Training Logs
| Epoch | Step | train_cosine_ndcg@10 |
|---|---|---|
| -1 | -1 | 0.7524 |
Framework Versions
- Python: 3.12.3
- Sentence Transformers: 5.1.0
- Transformers: 4.56.0
- PyTorch: 2.8.0+cu128
- Accelerate: 1.10.1
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 227
Model tree for redis/langcache-embed-v3
Base model
answerdotai/ModernBERT-base
Finetuned
Alibaba-NLP/gte-modernbert-base
Dataset used to train redis/langcache-embed-v3
Evaluation results
- Cosine Accuracy@1 on trainself-reported0.558
- Cosine Precision@1 on trainself-reported0.558
- Cosine Recall@1 on trainself-reported0.536
- Cosine Ndcg@10 on trainself-reported0.752
- Cosine Mrr@1 on trainself-reported0.558
- Cosine Map@100 on trainself-reported0.698