khang119966 commited on
Commit
a33af6d
·
verified ·
1 Parent(s): f985fc1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -5
README.md CHANGED
@@ -5,7 +5,42 @@ tags: []
5
 
6
  ## Model Details
7
 
8
- ### Model Description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  Dataset: [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
11
 
@@ -59,16 +94,16 @@ Dataset: [ViDoRe Benchmark](https://huggingface.co/collections/vidore/vidore-ben
59
  | TIGER-Lab/VLM2Vec-Full | 4.2B | 51.16 | 42.8 | 26.7 | 66.7 | 53.5 | 63.5 | 64 | 70.7 | 21.4 |
60
  | nvidia/llama-nemoretriever-colembed-3b-v1 | 4.4B | 90.42 | 88.4 | 66.2 | 94.9 | 99.6 | 96.6 | 97.8 | 99.3 | 80.6 |
61
  | nvidia/llama-nemoretriever-colembed-1b-v1 | 2.4B | 89.8 | 87.6 | 64.5 | 93.6 | 100 | 96.6 | 96.7 | 99.6 | 79.8 |
62
- | jinaai/jina-embeddings-v4 | 3.8B | 89.38 | 88.5 | 60.1 | 93.8 | 99.3 | 97.3 | 96.6 | 99.1 | 80.3 |
63
  | nomic-ai/colnomic-embed-multimodal-3b | 3B | 89.25 | 88.1 | 61.3 | 92.8 | 96.3 | 97.4 | 96.6 | 98.3 | 83.2 |
64
  | nomic-ai/colnomic-embed-multimodal-7b | 7B | 89.00 | 88.3 | 60.1 | 92.2 | 98.8 | 96.3 | 95.9 | 99.3 | 81.1 |
65
  | vidore/colqwen2.5-v0.2 | 3B | 89.58 | 88.9 | 63.6 | 92.5 | 99.6 | 96.1 | 95.8 | 98 | 82.1 |
66
- | vidore/colqwen2-v1.0 | 2.2B | 89.18 | 88 | 61.5 | 92.5 | 99 | 95.9 | 95.5 | 98.8 | 82.2 |
67
  | ibm-granite/granite-vision-3.3-2b-embedding | 3B | 85.98 | 84.2 | 54.6 | 89.7 | 98.9 | 96.3 | 97.3 | 98.9 | 67.9 |
68
  | vidore/colpali-v1.3 | 3B | 85.44 | 83.3 | 58.4 | 85.5 | 97.4 | 94.6 | 96.1 | 97.4 | 70.8 |
69
  | vidore/colpali-v1.2 | 3B | 83.16 | 77.8 | 56.6 | 82.2 | 97.5 | 93.8 | 94.4 | 94.9 | 68.1 |
70
- | ColVintern-1B | 0.9B | 78.8 | 71.6 | 48.3 | 84.6 | 92.9 | 88.7 | 89.4 | 95.2 | 59.6 |
71
- | Vintern-Embedding-1B | 0.9B | 82.85 | 75.37 | 51.79 | 86.2 | 97.52 | 93.19 | 93.97 | 97.09 | 67.72 |
72
 
73
  ## Quickstart:
74
 
 
5
 
6
  ## Model Details
7
 
8
+ ### Vintern-Embedding-1B – Model Overview
9
+
10
+ **Vintern-Embedding-1B** is the next-generation embedding model built on top of the base [Vintern-1B-v3\_5](https://huggingface.co/5CD-AI/Vintern-1B-v3_5). It was trained on over **1.5 million high-quality question–document pairs**, including both **Visual Question Answering (VQA)** and **pure text QA** tasks. Leveraging this large and diverse dataset, the model is capable of handling a wide range of **cross-modal retrieval tasks**, including:
11
+
12
+ * **Text → Visual**
13
+ * **Text → Text**
14
+ * **Visual → Visual**
15
+ * **Visual → Text**
16
+
17
+ Compared to **ColVintern-1B-v1**, which was more experimental, this version is significantly optimized and achieves **much higher retrieval quality**. Despite having only **\~0.9B parameters**, it performs competitively with larger 2B–7B multimodal embedding models, making it both **lightweight and highly effective**.
18
+
19
+ ---
20
+
21
+ ### Benchmark Highlights
22
+
23
+ * **GreenNode/Markdown Table Retrieval (Vietnamese)**
24
+
25
+ * Achieved **MAP\@5 = 57.01** and **Mean = 59.71**, clearly outperforming all existing multilingual and Vietnamese-specific embedding baselines.
26
+
27
+ * **GreenNode/Zalo Legal Text Retrieval (Vietnamese)**
28
+
29
+ * Scored **Mean = 73.14**, on par with or surpassing Vietnamese-specialized models, showing strong performance on long-text and legal retrieval tasks.
30
+
31
+ * **ViDoRe Benchmark (Global Multimodal Standard)**
32
+
33
+ * Reached **Average Score = 82.85**, improving over **ColVintern-1B v1 (78.8)** and approaching the performance of several 2B–3B multimodal embedding models.
34
+ * Particularly strong in domains such as **Artificial Intelligence (97.52)**, **Healthcare (97.09)**, and **Government (93.97)**.
35
+
36
+ ---
37
+
38
+ ### Summary
39
+
40
+ 👉 **Vintern-Embedding-1B (v2)** delivers **robust cross-modal retrieval**, excels on both **Vietnamese-specific** and **global multimodal benchmarks**, and remains highly **efficient at \~1B parameters**. It is a strong choice for **RAG pipelines**, **multimodal search engines**, and **information retrieval applications** in both **English and Vietnamese**.
41
+
42
+
43
+ ### Benchmarks
44
 
45
  Dataset: [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
46
 
 
94
  | TIGER-Lab/VLM2Vec-Full | 4.2B | 51.16 | 42.8 | 26.7 | 66.7 | 53.5 | 63.5 | 64 | 70.7 | 21.4 |
95
  | nvidia/llama-nemoretriever-colembed-3b-v1 | 4.4B | 90.42 | 88.4 | 66.2 | 94.9 | 99.6 | 96.6 | 97.8 | 99.3 | 80.6 |
96
  | nvidia/llama-nemoretriever-colembed-1b-v1 | 2.4B | 89.8 | 87.6 | 64.5 | 93.6 | 100 | 96.6 | 96.7 | 99.6 | 79.8 |
97
+ | jinaai/jina-embeddings-v4 | 3.8B | 89.38 | 88.5 | 60.1 | 93.8 | 99.3 | 97.3 | 96.6 | 99.1 | 80.3 |
98
  | nomic-ai/colnomic-embed-multimodal-3b | 3B | 89.25 | 88.1 | 61.3 | 92.8 | 96.3 | 97.4 | 96.6 | 98.3 | 83.2 |
99
  | nomic-ai/colnomic-embed-multimodal-7b | 7B | 89.00 | 88.3 | 60.1 | 92.2 | 98.8 | 96.3 | 95.9 | 99.3 | 81.1 |
100
  | vidore/colqwen2.5-v0.2 | 3B | 89.58 | 88.9 | 63.6 | 92.5 | 99.6 | 96.1 | 95.8 | 98 | 82.1 |
101
+ | vidore/colqwen2-v1.0 | 2.2B | 89.18 | 88 | 61.5 | 92.5 | 99 | 95.9 | 95.5 | 98.8 | 82.2 |
102
  | ibm-granite/granite-vision-3.3-2b-embedding | 3B | 85.98 | 84.2 | 54.6 | 89.7 | 98.9 | 96.3 | 97.3 | 98.9 | 67.9 |
103
  | vidore/colpali-v1.3 | 3B | 85.44 | 83.3 | 58.4 | 85.5 | 97.4 | 94.6 | 96.1 | 97.4 | 70.8 |
104
  | vidore/colpali-v1.2 | 3B | 83.16 | 77.8 | 56.6 | 82.2 | 97.5 | 93.8 | 94.4 | 94.9 | 68.1 |
105
+ | ColVintern-1B | 0.9B | 78.8 | 71.6 | 48.3 | 84.6 | 92.9 | 88.7 | 89.4 | 95.2 | 59.6 |
106
+ | **Vintern-Embedding-1B** | 0.9B | 82.85 | 75.37 | 51.79 | 86.2 | 97.52 | 93.19 | 93.97 | 97.09 | 67.72 |
107
 
108
  ## Quickstart:
109