Image-Text-to-Text
Safetensors
qwen2
conversational
Yin-Xie commited on
Commit
7b9f868
·
verified ·
1 Parent(s): 8b8f0b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -3
README.md CHANGED
@@ -1,3 +1,47 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ inference: false
4
+ pipeline_tag: image-text-to-text
5
+ datasets:
6
+ - liuhaotian/LLaVA-Pretrain
7
+ - lmms-lab/LLaVA-ReCap-CC12M
8
+ - lmms-lab/LLaVA-NeXT-Data
9
+ ---
10
+
11
+ <br>
12
+ <br>
13
+
14
+ # ViCToR Model Card
15
+
16
+ ## Model details
17
+
18
+ **Paper or resources for more information:**
19
+ https://github.com/deepglint/Victor
20
+
21
+
22
+ **Where to send questions or comments about the model:**
23
+ https://github.com/deepglint/Victor/issues
24
+
25
+
26
+ ## Results
27
+ | Benchmark | ViCTOR-7B | LLaVA-1.5-13B | LLaVA-NeXT-8B | Ross |
28
+ | ---------------- | --------- | ------------- | ------------- | ---- |
29
+ | MMStar | **54.3** | 34.3 | 43.9 | 53.9 |
30
+ | RealWorldQA | **65.6** | 55.3 | 58.4 | 58.7 |
31
+ | MMBench^(cn,val) | **79.0** | 67.8 | – | – |
32
+ | OCRBench | 556 | 337 | 531 | 553 |
33
+ | POPE | 88.4 | 88.4 | 87.1 | 88.1 |
34
+ | MMU | 48.9 | 37.0 | 43.1 | 49.0 |
35
+ | A12D | 79.5 | 61.1 | 72.8 | 79.5 |
36
+ | MME | 2071 | 1781 | 1908 | 1854 |
37
+ | SEED^(f) | **75.7** | 68.2 | 72.5 | 73.6 |
38
+
39
+ ## Citation
40
+ ```
41
+ @inproceedings{Xie2024ViCToRIV,
42
+ title={ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs},
43
+ author={Yin Xie and Kaicheng Yang and Ninghua Yang and Weimo Deng and Xiangzi Dai and Tiancheng Gu and Yumeng Wang and Xiang An and Yongle Zhao and Ziyong Feng and Jiankang Deng},
44
+ year={2024},
45
+ url={https://api.semanticscholar.org/CorpusID:273482504}
46
+ }
47
+ ```