DeepGlint-AI
/

ViCToR-LLaVA-SigLIP2-Qwen2.5-7b

Image-Text-to-Text

Model card Files Files and versions

Yin-Xie commited on Aug 15

Commit

7b9f868

·

verified ·

1 Parent(s): 8b8f0b3

Update README.md

Files changed (1) hide show

README.md +47 -3

README.md CHANGED Viewed

@@ -1,3 +1,47 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+inference: false
+pipeline_tag: image-text-to-text
+datasets:
+- liuhaotian/LLaVA-Pretrain
+- lmms-lab/LLaVA-ReCap-CC12M
+- lmms-lab/LLaVA-NeXT-Data
+---
+<br>
+<br>
+# ViCToR Model Card
+## Model details
+**Paper or resources for more information:**
+https://github.com/deepglint/Victor
+**Where to send questions or comments about the model:**
+https://github.com/deepglint/Victor/issues
+## Results
+| Benchmark        | ViCTOR-7B | LLaVA-1.5-13B | LLaVA-NeXT-8B | Ross |
+| ---------------- | --------- | ------------- | ------------- | ---- |
+| MMStar           | **54.3**  | 34.3          | 43.9          | 53.9 |
+| RealWorldQA      | **65.6**  | 55.3          | 58.4          | 58.7 |
+| MMBench^(cn,val) | **79.0**  | 67.8          | –             | –    |
+| OCRBench         | 556       | 337           | 531           | 553  |
+| POPE             | 88.4      | 88.4          | 87.1          | 88.1 |
+| MMU              | 48.9      | 37.0          | 43.1          | 49.0 |
+| A12D             | 79.5      | 61.1          | 72.8          | 79.5 |
+| MME              | 2071      | 1781          | 1908          | 1854 |
+| SEED^(f)         | **75.7**  | 68.2          | 72.5          | 73.6 |
+## Citation
+```
+@inproceedings{Xie2024ViCToRIV,
+  title={ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs},
+  author={Yin Xie and Kaicheng Yang and Ninghua Yang and Weimo Deng and Xiangzi Dai and Tiancheng Gu and Yumeng Wang and Xiang An and Yongle Zhao and Ziyong Feng and Jiankang Deng},
+  year={2024},
+  url={https://api.semanticscholar.org/CorpusID:273482504}
+}
+```