Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,42 @@ tags: []
|
|
5 |
|
6 |
## Model Details
|
7 |
|
8 |
-
### Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
Dataset: [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
|
11 |
|
@@ -59,16 +94,16 @@ Dataset: [ViDoRe Benchmark](https://huggingface.co/collections/vidore/vidore-ben
|
|
59 |
| TIGER-Lab/VLM2Vec-Full | 4.2B | 51.16 | 42.8 | 26.7 | 66.7 | 53.5 | 63.5 | 64 | 70.7 | 21.4 |
|
60 |
| nvidia/llama-nemoretriever-colembed-3b-v1 | 4.4B | 90.42 | 88.4 | 66.2 | 94.9 | 99.6 | 96.6 | 97.8 | 99.3 | 80.6 |
|
61 |
| nvidia/llama-nemoretriever-colembed-1b-v1 | 2.4B | 89.8 | 87.6 | 64.5 | 93.6 | 100 | 96.6 | 96.7 | 99.6 | 79.8 |
|
62 |
-
| jinaai/jina-embeddings-v4 | 3.8B
|
63 |
| nomic-ai/colnomic-embed-multimodal-3b | 3B | 89.25 | 88.1 | 61.3 | 92.8 | 96.3 | 97.4 | 96.6 | 98.3 | 83.2 |
|
64 |
| nomic-ai/colnomic-embed-multimodal-7b | 7B | 89.00 | 88.3 | 60.1 | 92.2 | 98.8 | 96.3 | 95.9 | 99.3 | 81.1 |
|
65 |
| vidore/colqwen2.5-v0.2 | 3B | 89.58 | 88.9 | 63.6 | 92.5 | 99.6 | 96.1 | 95.8 | 98 | 82.1 |
|
66 |
-
| vidore/colqwen2-v1.0 | 2.2B
|
67 |
| ibm-granite/granite-vision-3.3-2b-embedding | 3B | 85.98 | 84.2 | 54.6 | 89.7 | 98.9 | 96.3 | 97.3 | 98.9 | 67.9 |
|
68 |
| vidore/colpali-v1.3 | 3B | 85.44 | 83.3 | 58.4 | 85.5 | 97.4 | 94.6 | 96.1 | 97.4 | 70.8 |
|
69 |
| vidore/colpali-v1.2 | 3B | 83.16 | 77.8 | 56.6 | 82.2 | 97.5 | 93.8 | 94.4 | 94.9 | 68.1 |
|
70 |
-
| ColVintern-1B | 0.9B
|
71 |
-
| Vintern-Embedding-1B
|
72 |
|
73 |
## Quickstart:
|
74 |
|
|
|
5 |
|
6 |
## Model Details
|
7 |
|
8 |
+
### Vintern-Embedding-1B – Model Overview
|
9 |
+
|
10 |
+
**Vintern-Embedding-1B** is the next-generation embedding model built on top of the base [Vintern-1B-v3\_5](https://huggingface.co/5CD-AI/Vintern-1B-v3_5). It was trained on over **1.5 million high-quality question–document pairs**, including both **Visual Question Answering (VQA)** and **pure text QA** tasks. Leveraging this large and diverse dataset, the model is capable of handling a wide range of **cross-modal retrieval tasks**, including:
|
11 |
+
|
12 |
+
* **Text → Visual**
|
13 |
+
* **Text → Text**
|
14 |
+
* **Visual → Visual**
|
15 |
+
* **Visual → Text**
|
16 |
+
|
17 |
+
Compared to **ColVintern-1B-v1**, which was more experimental, this version is significantly optimized and achieves **much higher retrieval quality**. Despite having only **\~0.9B parameters**, it performs competitively with larger 2B–7B multimodal embedding models, making it both **lightweight and highly effective**.
|
18 |
+
|
19 |
+
---
|
20 |
+
|
21 |
+
### Benchmark Highlights
|
22 |
+
|
23 |
+
* **GreenNode/Markdown Table Retrieval (Vietnamese)**
|
24 |
+
|
25 |
+
* Achieved **MAP\@5 = 57.01** and **Mean = 59.71**, clearly outperforming all existing multilingual and Vietnamese-specific embedding baselines.
|
26 |
+
|
27 |
+
* **GreenNode/Zalo Legal Text Retrieval (Vietnamese)**
|
28 |
+
|
29 |
+
* Scored **Mean = 73.14**, on par with or surpassing Vietnamese-specialized models, showing strong performance on long-text and legal retrieval tasks.
|
30 |
+
|
31 |
+
* **ViDoRe Benchmark (Global Multimodal Standard)**
|
32 |
+
|
33 |
+
* Reached **Average Score = 82.85**, improving over **ColVintern-1B v1 (78.8)** and approaching the performance of several 2B–3B multimodal embedding models.
|
34 |
+
* Particularly strong in domains such as **Artificial Intelligence (97.52)**, **Healthcare (97.09)**, and **Government (93.97)**.
|
35 |
+
|
36 |
+
---
|
37 |
+
|
38 |
+
### Summary
|
39 |
+
|
40 |
+
👉 **Vintern-Embedding-1B (v2)** delivers **robust cross-modal retrieval**, excels on both **Vietnamese-specific** and **global multimodal benchmarks**, and remains highly **efficient at \~1B parameters**. It is a strong choice for **RAG pipelines**, **multimodal search engines**, and **information retrieval applications** in both **English and Vietnamese**.
|
41 |
+
|
42 |
+
|
43 |
+
### Benchmarks
|
44 |
|
45 |
Dataset: [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
|
46 |
|
|
|
94 |
| TIGER-Lab/VLM2Vec-Full | 4.2B | 51.16 | 42.8 | 26.7 | 66.7 | 53.5 | 63.5 | 64 | 70.7 | 21.4 |
|
95 |
| nvidia/llama-nemoretriever-colembed-3b-v1 | 4.4B | 90.42 | 88.4 | 66.2 | 94.9 | 99.6 | 96.6 | 97.8 | 99.3 | 80.6 |
|
96 |
| nvidia/llama-nemoretriever-colembed-1b-v1 | 2.4B | 89.8 | 87.6 | 64.5 | 93.6 | 100 | 96.6 | 96.7 | 99.6 | 79.8 |
|
97 |
+
| jinaai/jina-embeddings-v4 | 3.8B | 89.38 | 88.5 | 60.1 | 93.8 | 99.3 | 97.3 | 96.6 | 99.1 | 80.3 |
|
98 |
| nomic-ai/colnomic-embed-multimodal-3b | 3B | 89.25 | 88.1 | 61.3 | 92.8 | 96.3 | 97.4 | 96.6 | 98.3 | 83.2 |
|
99 |
| nomic-ai/colnomic-embed-multimodal-7b | 7B | 89.00 | 88.3 | 60.1 | 92.2 | 98.8 | 96.3 | 95.9 | 99.3 | 81.1 |
|
100 |
| vidore/colqwen2.5-v0.2 | 3B | 89.58 | 88.9 | 63.6 | 92.5 | 99.6 | 96.1 | 95.8 | 98 | 82.1 |
|
101 |
+
| vidore/colqwen2-v1.0 | 2.2B | 89.18 | 88 | 61.5 | 92.5 | 99 | 95.9 | 95.5 | 98.8 | 82.2 |
|
102 |
| ibm-granite/granite-vision-3.3-2b-embedding | 3B | 85.98 | 84.2 | 54.6 | 89.7 | 98.9 | 96.3 | 97.3 | 98.9 | 67.9 |
|
103 |
| vidore/colpali-v1.3 | 3B | 85.44 | 83.3 | 58.4 | 85.5 | 97.4 | 94.6 | 96.1 | 97.4 | 70.8 |
|
104 |
| vidore/colpali-v1.2 | 3B | 83.16 | 77.8 | 56.6 | 82.2 | 97.5 | 93.8 | 94.4 | 94.9 | 68.1 |
|
105 |
+
| ColVintern-1B | 0.9B | 78.8 | 71.6 | 48.3 | 84.6 | 92.9 | 88.7 | 89.4 | 95.2 | 59.6 |
|
106 |
+
| **Vintern-Embedding-1B** | 0.9B | 82.85 | 75.37 | 51.79 | 86.2 | 97.52 | 93.19 | 93.97 | 97.09 | 67.72 |
|
107 |
|
108 |
## Quickstart:
|
109 |
|