Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,42 @@ tags: []
|
|
| 5 |
|
| 6 |
## Model Details
|
| 7 |
|
| 8 |
-
### Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
Dataset: [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
|
| 11 |
|
|
@@ -59,16 +94,16 @@ Dataset: [ViDoRe Benchmark](https://huggingface.co/collections/vidore/vidore-ben
|
|
| 59 |
| TIGER-Lab/VLM2Vec-Full | 4.2B | 51.16 | 42.8 | 26.7 | 66.7 | 53.5 | 63.5 | 64 | 70.7 | 21.4 |
|
| 60 |
| nvidia/llama-nemoretriever-colembed-3b-v1 | 4.4B | 90.42 | 88.4 | 66.2 | 94.9 | 99.6 | 96.6 | 97.8 | 99.3 | 80.6 |
|
| 61 |
| nvidia/llama-nemoretriever-colembed-1b-v1 | 2.4B | 89.8 | 87.6 | 64.5 | 93.6 | 100 | 96.6 | 96.7 | 99.6 | 79.8 |
|
| 62 |
-
| jinaai/jina-embeddings-v4 | 3.8B
|
| 63 |
| nomic-ai/colnomic-embed-multimodal-3b | 3B | 89.25 | 88.1 | 61.3 | 92.8 | 96.3 | 97.4 | 96.6 | 98.3 | 83.2 |
|
| 64 |
| nomic-ai/colnomic-embed-multimodal-7b | 7B | 89.00 | 88.3 | 60.1 | 92.2 | 98.8 | 96.3 | 95.9 | 99.3 | 81.1 |
|
| 65 |
| vidore/colqwen2.5-v0.2 | 3B | 89.58 | 88.9 | 63.6 | 92.5 | 99.6 | 96.1 | 95.8 | 98 | 82.1 |
|
| 66 |
-
| vidore/colqwen2-v1.0 | 2.2B
|
| 67 |
| ibm-granite/granite-vision-3.3-2b-embedding | 3B | 85.98 | 84.2 | 54.6 | 89.7 | 98.9 | 96.3 | 97.3 | 98.9 | 67.9 |
|
| 68 |
| vidore/colpali-v1.3 | 3B | 85.44 | 83.3 | 58.4 | 85.5 | 97.4 | 94.6 | 96.1 | 97.4 | 70.8 |
|
| 69 |
| vidore/colpali-v1.2 | 3B | 83.16 | 77.8 | 56.6 | 82.2 | 97.5 | 93.8 | 94.4 | 94.9 | 68.1 |
|
| 70 |
-
| ColVintern-1B | 0.9B
|
| 71 |
-
| Vintern-Embedding-1B
|
| 72 |
|
| 73 |
## Quickstart:
|
| 74 |
|
|
|
|
| 5 |
|
| 6 |
## Model Details
|
| 7 |
|
| 8 |
+
### Vintern-Embedding-1B – Model Overview
|
| 9 |
+
|
| 10 |
+
**Vintern-Embedding-1B** is the next-generation embedding model built on top of the base [Vintern-1B-v3\_5](https://huggingface.co/5CD-AI/Vintern-1B-v3_5). It was trained on over **1.5 million high-quality question–document pairs**, including both **Visual Question Answering (VQA)** and **pure text QA** tasks. Leveraging this large and diverse dataset, the model is capable of handling a wide range of **cross-modal retrieval tasks**, including:
|
| 11 |
+
|
| 12 |
+
* **Text → Visual**
|
| 13 |
+
* **Text → Text**
|
| 14 |
+
* **Visual → Visual**
|
| 15 |
+
* **Visual → Text**
|
| 16 |
+
|
| 17 |
+
Compared to **ColVintern-1B-v1**, which was more experimental, this version is significantly optimized and achieves **much higher retrieval quality**. Despite having only **\~0.9B parameters**, it performs competitively with larger 2B–7B multimodal embedding models, making it both **lightweight and highly effective**.
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
### Benchmark Highlights
|
| 22 |
+
|
| 23 |
+
* **GreenNode/Markdown Table Retrieval (Vietnamese)**
|
| 24 |
+
|
| 25 |
+
* Achieved **MAP\@5 = 57.01** and **Mean = 59.71**, clearly outperforming all existing multilingual and Vietnamese-specific embedding baselines.
|
| 26 |
+
|
| 27 |
+
* **GreenNode/Zalo Legal Text Retrieval (Vietnamese)**
|
| 28 |
+
|
| 29 |
+
* Scored **Mean = 73.14**, on par with or surpassing Vietnamese-specialized models, showing strong performance on long-text and legal retrieval tasks.
|
| 30 |
+
|
| 31 |
+
* **ViDoRe Benchmark (Global Multimodal Standard)**
|
| 32 |
+
|
| 33 |
+
* Reached **Average Score = 82.85**, improving over **ColVintern-1B v1 (78.8)** and approaching the performance of several 2B–3B multimodal embedding models.
|
| 34 |
+
* Particularly strong in domains such as **Artificial Intelligence (97.52)**, **Healthcare (97.09)**, and **Government (93.97)**.
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
### Summary
|
| 39 |
+
|
| 40 |
+
👉 **Vintern-Embedding-1B (v2)** delivers **robust cross-modal retrieval**, excels on both **Vietnamese-specific** and **global multimodal benchmarks**, and remains highly **efficient at \~1B parameters**. It is a strong choice for **RAG pipelines**, **multimodal search engines**, and **information retrieval applications** in both **English and Vietnamese**.
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
### Benchmarks
|
| 44 |
|
| 45 |
Dataset: [GreenNode/GreenNode-Table-Markdown-Retrieval](https://huggingface.co/datasets/GreenNode/GreenNode-Table-Markdown-Retrieval-VN)
|
| 46 |
|
|
|
|
| 94 |
| TIGER-Lab/VLM2Vec-Full | 4.2B | 51.16 | 42.8 | 26.7 | 66.7 | 53.5 | 63.5 | 64 | 70.7 | 21.4 |
|
| 95 |
| nvidia/llama-nemoretriever-colembed-3b-v1 | 4.4B | 90.42 | 88.4 | 66.2 | 94.9 | 99.6 | 96.6 | 97.8 | 99.3 | 80.6 |
|
| 96 |
| nvidia/llama-nemoretriever-colembed-1b-v1 | 2.4B | 89.8 | 87.6 | 64.5 | 93.6 | 100 | 96.6 | 96.7 | 99.6 | 79.8 |
|
| 97 |
+
| jinaai/jina-embeddings-v4 | 3.8B | 89.38 | 88.5 | 60.1 | 93.8 | 99.3 | 97.3 | 96.6 | 99.1 | 80.3 |
|
| 98 |
| nomic-ai/colnomic-embed-multimodal-3b | 3B | 89.25 | 88.1 | 61.3 | 92.8 | 96.3 | 97.4 | 96.6 | 98.3 | 83.2 |
|
| 99 |
| nomic-ai/colnomic-embed-multimodal-7b | 7B | 89.00 | 88.3 | 60.1 | 92.2 | 98.8 | 96.3 | 95.9 | 99.3 | 81.1 |
|
| 100 |
| vidore/colqwen2.5-v0.2 | 3B | 89.58 | 88.9 | 63.6 | 92.5 | 99.6 | 96.1 | 95.8 | 98 | 82.1 |
|
| 101 |
+
| vidore/colqwen2-v1.0 | 2.2B | 89.18 | 88 | 61.5 | 92.5 | 99 | 95.9 | 95.5 | 98.8 | 82.2 |
|
| 102 |
| ibm-granite/granite-vision-3.3-2b-embedding | 3B | 85.98 | 84.2 | 54.6 | 89.7 | 98.9 | 96.3 | 97.3 | 98.9 | 67.9 |
|
| 103 |
| vidore/colpali-v1.3 | 3B | 85.44 | 83.3 | 58.4 | 85.5 | 97.4 | 94.6 | 96.1 | 97.4 | 70.8 |
|
| 104 |
| vidore/colpali-v1.2 | 3B | 83.16 | 77.8 | 56.6 | 82.2 | 97.5 | 93.8 | 94.4 | 94.9 | 68.1 |
|
| 105 |
+
| ColVintern-1B | 0.9B | 78.8 | 71.6 | 48.3 | 84.6 | 92.9 | 88.7 | 89.4 | 95.2 | 59.6 |
|
| 106 |
+
| **Vintern-Embedding-1B** | 0.9B | 82.85 | 75.37 | 51.79 | 86.2 | 97.52 | 93.19 | 93.97 | 97.09 | 67.72 |
|
| 107 |
|
| 108 |
## Quickstart:
|
| 109 |
|