5CD-AI
/

Vintern-Embedding-1B

@@ -5,7 +5,42 @@ tags: []
 ## Model Details
-### Model Description
 Dataset:  [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
@@ -59,16 +94,16 @@ Dataset: [ViDoRe Benchmark](https://huggingface.co/collections/vidore/vidore-ben
 | TIGER-Lab/VLM2Vec-Full                        | 4.2B       | 51.16         | 42.8    | 26.7   | 66.7    | 53.5                    | 63.5   | 64         | 70.7                 | 21.4    |
 | nvidia/llama-nemoretriever-colembed-3b-v1     | 4.4B       | 90.42         | 88.4    | 66.2   | 94.9    | 99.6                    | 96.6   | 97.8       | 99.3                 | 80.6    |
 | nvidia/llama-nemoretriever-colembed-1b-v1     | 2.4B       | 89.8          | 87.6    | 64.5   | 93.6    | 100                     | 96.6   | 96.7       | 99.6                 | 79.8    |
-| jinaai/jina-embeddings-v4                     | 3.8B       | 89.38         | 88.5    | 60.1   | 93.8    | 99.3                    | 97.3   | 96.6       | 99.1                 | 80.3    |
 | nomic-ai/colnomic-embed-multimodal-3b         | 3B       | 89.25         | 88.1    | 61.3   | 92.8    | 96.3                    | 97.4   | 96.6       | 98.3                 | 83.2    |
 | nomic-ai/colnomic-embed-multimodal-7b         | 7B       | 89.00         | 88.3    | 60.1   | 92.2    | 98.8                    | 96.3   | 95.9       | 99.3                 | 81.1    |
 | vidore/colqwen2.5-v0.2                        | 3B       | 89.58         | 88.9    | 63.6   | 92.5    | 99.6                    | 96.1   | 95.8       | 98                   | 82.1    |
-| vidore/colqwen2-v1.0                          | 2.2B       | 89.18         | 88      | 61.5   | 92.5    | 99                      | 95.9   | 95.5       | 98.8                 | 82.2    |
 | ibm-granite/granite-vision-3.3-2b-embedding   | 3B       | 85.98         | 84.2    | 54.6   | 89.7    | 98.9                    | 96.3   | 97.3       | 98.9                 | 67.9    |
 | vidore/colpali-v1.3                           | 3B       | 85.44         | 83.3    | 58.4   | 85.5    | 97.4                    | 94.6   | 96.1       | 97.4                 | 70.8    |
 | vidore/colpali-v1.2                           | 3B       | 83.16         | 77.8    | 56.6   | 82.2    | 97.5                    | 93.8   | 94.4       | 94.9                 | 68.1    |
-| ColVintern-1B                                 | 0.9B        | 78.8          | 71.6    | 48.3   | 84.6    | 92.9                    | 88.7   | 89.4       | 95.2                 | 59.6    |
-| Vintern-Embedding-1B                             | 0.9B        | 82.85         | 75.37   | 51.79  | 86.2    | 97.52                   | 93.19  | 93.97      | 97.09                | 67.72   |
 ## Quickstart:

 ## Model Details
+### Vintern-Embedding-1B – Model Overview
+**Vintern-Embedding-1B** is the next-generation embedding model built on top of the base [Vintern-1B-v3\_5](https://huggingface.co/5CD-AI/Vintern-1B-v3_5). It was trained on over **1.5 million high-quality question–document pairs**, including both **Visual Question Answering (VQA)** and **pure text QA** tasks. Leveraging this large and diverse dataset, the model is capable of handling a wide range of **cross-modal retrieval tasks**, including:
+* **Text → Visual**
+* **Text → Text**
+* **Visual → Visual**
+* **Visual → Text**
+Compared to **ColVintern-1B-v1**, which was more experimental, this version is significantly optimized and achieves **much higher retrieval quality**. Despite having only **\~0.9B parameters**, it performs competitively with larger 2B–7B multimodal embedding models, making it both **lightweight and highly effective**.
+---
+### Benchmark Highlights
+* **GreenNode/Markdown Table Retrieval (Vietnamese)**
+  * Achieved **MAP\@5 = 57.01** and **Mean = 59.71**, clearly outperforming all existing multilingual and Vietnamese-specific embedding baselines.
+* **GreenNode/Zalo Legal Text Retrieval (Vietnamese)**
+  * Scored **Mean = 73.14**, on par with or surpassing Vietnamese-specialized models, showing strong performance on long-text and legal retrieval tasks.
+* **ViDoRe Benchmark (Global Multimodal Standard)**
+  * Reached **Average Score = 82.85**, improving over **ColVintern-1B v1 (78.8)** and approaching the performance of several 2B–3B multimodal embedding models.
+  * Particularly strong in domains such as **Artificial Intelligence (97.52)**, **Healthcare (97.09)**, and **Government (93.97)**.
+---
+### Summary
+👉 **Vintern-Embedding-1B (v2)** delivers **robust cross-modal retrieval**, excels on both **Vietnamese-specific** and **global multimodal benchmarks**, and remains highly **efficient at \~1B parameters**. It is a strong choice for **RAG pipelines**, **multimodal search engines**, and **information retrieval applications** in both **English and Vietnamese**.
+### Benchmarks
 Dataset:  [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
 | TIGER-Lab/VLM2Vec-Full                        | 4.2B       | 51.16         | 42.8    | 26.7   | 66.7    | 53.5                    | 63.5   | 64         | 70.7                 | 21.4    |
 | nvidia/llama-nemoretriever-colembed-3b-v1     | 4.4B       | 90.42         | 88.4    | 66.2   | 94.9    | 99.6                    | 96.6   | 97.8       | 99.3                 | 80.6    |
 | nvidia/llama-nemoretriever-colembed-1b-v1     | 2.4B       | 89.8          | 87.6    | 64.5   | 93.6    | 100                     | 96.6   | 96.7       | 99.6                 | 79.8    |
+| jinaai/jina-embeddings-v4                     | 3.8B     | 89.38         | 88.5    | 60.1   | 93.8    | 99.3                    | 97.3   | 96.6       | 99.1                 | 80.3    |
 | nomic-ai/colnomic-embed-multimodal-3b         | 3B       | 89.25         | 88.1    | 61.3   | 92.8    | 96.3                    | 97.4   | 96.6       | 98.3                 | 83.2    |
 | nomic-ai/colnomic-embed-multimodal-7b         | 7B       | 89.00         | 88.3    | 60.1   | 92.2    | 98.8                    | 96.3   | 95.9       | 99.3                 | 81.1    |
 | vidore/colqwen2.5-v0.2                        | 3B       | 89.58         | 88.9    | 63.6   | 92.5    | 99.6                    | 96.1   | 95.8       | 98                   | 82.1    |
+| vidore/colqwen2-v1.0                          | 2.2B     | 89.18         | 88      | 61.5   | 92.5    | 99                      | 95.9   | 95.5       | 98.8                 | 82.2    |
 | ibm-granite/granite-vision-3.3-2b-embedding   | 3B       | 85.98         | 84.2    | 54.6   | 89.7    | 98.9                    | 96.3   | 97.3       | 98.9                 | 67.9    |
 | vidore/colpali-v1.3                           | 3B       | 85.44         | 83.3    | 58.4   | 85.5    | 97.4                    | 94.6   | 96.1       | 97.4                 | 70.8    |
 | vidore/colpali-v1.2                           | 3B       | 83.16         | 77.8    | 56.6   | 82.2    | 97.5                    | 93.8   | 94.4       | 94.9                 | 68.1    |
+| ColVintern-1B                                 | 0.9B     | 78.8          | 71.6    | 48.3   | 84.6    | 92.9                    | 88.7   | 89.4       | 95.2                 | 59.6    |
+| **Vintern-Embedding-1B**                      | 0.9B     | 82.85         | 75.37   | 51.79  | 86.2    | 97.52                   | 93.19  | 93.97      | 97.09                | 67.72   |
 ## Quickstart: