jinaai
/

jina-embedding-l-en-v1

Sentence Similarity

sentence-transformers

feature-extraction

Model card Files Files and versions

bwang0911 commited on Jul 16, 2023

Commit

64bc1c6

·

1 Parent(s): 8ba26fe

Update README.md

Files changed (1) hide show

README.md +30 -0

README.md CHANGED Viewed

@@ -1783,6 +1783,36 @@ embeddings = finetuner.encode(
 print(finetuner.cos_sim(embeddings[0], embeddings[1]))
 ```
 ## Fine-tuning
 Please consider [Finetuner](https://github.com/jina-ai/finetuner).

 print(finetuner.cos_sim(embeddings[0], embeddings[1]))
 ```
+Use directly with Huggingface Transformers:
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+def mean_pooling(model_output, attention_mask):
+    token_embeddings = model_output[0]
+    input_mask_expanded = (
+        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    )
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
+        input_mask_expanded.sum(1), min=1e-9
+    )
+sentences = ['how is the weather today', 'What is the current weather like today?']
+# Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embedding-l-en-v1')
+model = AutoModel.from_pretrained('jinaai/jina-embedding-l-en-v1')
+with torch.inference_mode():
+    encoded_input = tokenizer(
+        sentences, padding=True, truncation=True, return_tensors='pt'
+    )
+    model_output = model.encoder(**encoded_input)
+    embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+```
 ## Fine-tuning
 Please consider [Finetuner](https://github.com/jina-ai/finetuner).