Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,47 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
inference: false
|
| 4 |
+
pipeline_tag: image-text-to-text
|
| 5 |
+
datasets:
|
| 6 |
+
- liuhaotian/LLaVA-Pretrain
|
| 7 |
+
- lmms-lab/LLaVA-ReCap-CC12M
|
| 8 |
+
- lmms-lab/LLaVA-NeXT-Data
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
<br>
|
| 12 |
+
<br>
|
| 13 |
+
|
| 14 |
+
# ViCToR Model Card
|
| 15 |
+
|
| 16 |
+
## Model details
|
| 17 |
+
|
| 18 |
+
**Paper or resources for more information:**
|
| 19 |
+
https://github.com/deepglint/Victor
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
**Where to send questions or comments about the model:**
|
| 23 |
+
https://github.com/deepglint/Victor/issues
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
## Results
|
| 27 |
+
| Benchmark | ViCTOR-7B | LLaVA-1.5-13B | LLaVA-NeXT-8B | Ross |
|
| 28 |
+
| ---------------- | --------- | ------------- | ------------- | ---- |
|
| 29 |
+
| MMStar | **54.3** | 34.3 | 43.9 | 53.9 |
|
| 30 |
+
| RealWorldQA | **65.6** | 55.3 | 58.4 | 58.7 |
|
| 31 |
+
| MMBench^(cn,val) | **79.0** | 67.8 | – | – |
|
| 32 |
+
| OCRBench | 556 | 337 | 531 | 553 |
|
| 33 |
+
| POPE | 88.4 | 88.4 | 87.1 | 88.1 |
|
| 34 |
+
| MMU | 48.9 | 37.0 | 43.1 | 49.0 |
|
| 35 |
+
| A12D | 79.5 | 61.1 | 72.8 | 79.5 |
|
| 36 |
+
| MME | 2071 | 1781 | 1908 | 1854 |
|
| 37 |
+
| SEED^(f) | **75.7** | 68.2 | 72.5 | 73.6 |
|
| 38 |
+
|
| 39 |
+
## Citation
|
| 40 |
+
```
|
| 41 |
+
@inproceedings{Xie2024ViCToRIV,
|
| 42 |
+
title={ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs},
|
| 43 |
+
author={Yin Xie and Kaicheng Yang and Ninghua Yang and Weimo Deng and Xiangzi Dai and Tiancheng Gu and Yumeng Wang and Xiang An and Yongle Zhao and Ziyong Feng and Jiankang Deng},
|
| 44 |
+
year={2024},
|
| 45 |
+
url={https://api.semanticscholar.org/CorpusID:273482504}
|
| 46 |
+
}
|
| 47 |
+
```
|