jinaai
/

jina-embeddings-v3

Feature Extraction

sentence-transformers

sentence-similarity

🇪🇺 Region: EU

Model card Files Files and versions

jupyterjazz commited on Sep 11, 2024

Commit

a8ff6b6

·

verified ·

1 Parent(s): 8ea62b5

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -201,7 +201,7 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
 </p>
 </details>
-1. The easiest way to starting using `jina-embeddings-v3` is to use Jina AI's [Embeddings API](https://jina.ai/embeddings/).
 2. Alternatively, you can use `jina-embeddings-v3` directly via transformers package.
 ```python
@@ -220,9 +220,9 @@ texts = [
     'Folge dem weißen Kaninchen.'            # German
 ]
-# When calling the `encode` function, you can choose a task_type based on the use case:
 # 'retrieval.query', 'retrieval.passage', 'separation', 'classification', 'text-matching'
-# Alternatively, you can choose not to pass a task_type, and no specific LoRA adapter will be used.
 embeddings = model.encode(texts, task_type='text-matching')
 # Compute similarities
@@ -230,7 +230,7 @@ print(embeddings[0] @ embeddings[1].T)
 ```
 By default, the model supports a maximum sequence length of 8192 tokens.
-However, if you want to truncate your input texts to a shorter length, you can pass the `max_length` parameter to the encode function:
 ```python
 embeddings = model.encode(
     ['Very long ... document'],
@@ -238,8 +238,8 @@ embeddings = model.encode(
 )
 ```
-In case you want to use Matryoshka embeddings and switch to a different embedding dimension,
-you can adjust the embedding dimension by passing the `truncate_dim` parameter to the encode function:
 ```python
 embeddings = model.encode(
     ['Sample text'],

 </p>
 </details>
+1. The easiest way to start using `jina-embeddings-v3` is Jina AI's [Embeddings API](https://jina.ai/embeddings/).
 2. Alternatively, you can use `jina-embeddings-v3` directly via transformers package.
 ```python
     'Folge dem weißen Kaninchen.'            # German
 ]
+# When calling the `encode` function, you can choose a `task_type` based on the use case:
 # 'retrieval.query', 'retrieval.passage', 'separation', 'classification', 'text-matching'
+# Alternatively, you can choose not to pass a `task_type`, and no specific LoRA adapter will be used.
 embeddings = model.encode(texts, task_type='text-matching')
 # Compute similarities
 ```
 By default, the model supports a maximum sequence length of 8192 tokens.
+However, if you want to truncate your input texts to a shorter length, you can pass the `max_length` parameter to the `encode` function:
 ```python
 embeddings = model.encode(
     ['Very long ... document'],
 )
 ```
+In case you want to use **Matryoshka embeddings** and switch to a different dimension,
+you can adjust it by passing the `truncate_dim` parameter to the `encode` function:
 ```python
 embeddings = model.encode(
     ['Sample text'],