Image-Text-to-Text
Transformers
Safetensors
feature-extraction
conversational
custom_code
Yin-Xie commited on
Commit
660054d
·
verified ·
1 Parent(s): 9e89ffa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -29
README.md CHANGED
@@ -44,35 +44,35 @@ Complete end-to-end training framework designed for maximum efficiency:
44
  ## Evaluation Results
45
  All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
46
 
47
- | | **LLaVA-OV-1.5-8B** | **Qwen2.5 VL 7B** | **LLaVA-OV-1.5-4B** | **Qwen2.5 VL 3B** |
48
- |:----------------------------------|:---------------:|:-------------:|:---------------:|:-------------:|
49
- | MMMU (Validation) | **55.44** | 51.33 | **51.44** | 46.44 |
50
- | MMMU-Pro (Standard) | **37.40** | 36.30 | **33.24** | 31.10 |
51
- | MMMU-Pro (Vision) | 25.15 | **32.83** | **23.53** | 21.27 |
52
- | MMBench (English; Test) | **84.14** | 83.40 | **82.29** | 77.97 |
53
- | MMBench (Chinese; Test) | 81.00 | **81.61** | **76.73** | 74.55 |
54
- | MME-RealWorld (English) | **62.31** | 57.33 | **57.16** | 51.60 |
55
- | MME-RealWorld (Chinese) | **56.11** | 51.50 | 21.38 | **45.38** |
56
- | AI2D (With Mask) | **84.16** | 82.58 | **84.62** | 78.56 |
57
- | AI2D (Without Mask) | **94.11** | 93.36 | **92.84** | 90.74 |
58
- | CV-Bench | **80.82** | 79.95 | **74.00** | 71.53 |
59
- | VL-RewardBench | 45.90 | **49.65** | **45.90** | 42.06 |
60
- | V* | **78.01** | 76.96 | 66.49 | **69.63** |
61
- | PixmoCount | 62.19 | **63.33** | **59.17** | 50.85 |
62
- | CountBench | **88.19** | 86.35 | **77.80** | 72.51 |
63
- | ChartQA | **86.48** | 84.08 | **85.11** | 83.36 |
64
- | CharXiv (Direct Questions) | **74.10** | 69.80 | **70.70** | 58.20 |
65
- | DocVQA (Test) | **95.00** | 94.93 | **93.48** | 92.67 |
66
- | InfoVQA (Test) | 78.42 | **81.67** | **75.27** | 75.63 |
67
- | WeMath | **33.62** | 33.33 | **28.00** | 18.38 |
68
- | MathVista (Mini) | **69.57** | 68.60 | **67.36** | 60.23 |
69
- | MathVision | **25.56** | 22.37 | **22.76** | 21.25 |
70
- | MMStar | **67.72** | 62.54 | **64.22** | 55.86 |
71
- | SEED-Bench (Image) | 77.32 | **77.53** | **76.74** | 74.81 |
72
- | ScienceQA | **94.98** | 88.75 | **92.05** | 83.33 |
73
- | SEED-Bench 2-Plus | 69.21 | **70.93** | **68.42** | 68.64 |
74
- | OCRBench | 82.90 | **84.20** | 77.80 | **79.20** |
75
- | RealWorldQA | 68.10 | **68.50** | **64.05** | 60.00 |
76
 
77
  ### Using 🤗 Transformers to Chat
78
  Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
 
44
  ## Evaluation Results
45
  All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
46
 
47
+ | | **LLaVA-OV-1.5-8B** | **Qwen2.5 VL 7B** |
48
+ |:----------------------------------|:---------------:|:-------------:|
49
+ | MMMU (Validation) | **55.44** | 51.33 |
50
+ | MMMU-Pro (Standard) | **37.40** | 36.30 |
51
+ | MMMU-Pro (Vision) | 25.15 | **32.83** |
52
+ | MMBench (English; Test) | **84.14** | 83.40 |
53
+ | MMBench (Chinese; Test) | 81.00 | **81.61** |
54
+ | MME-RealWorld (English) | **62.31** | 57.33 |
55
+ | MME-RealWorld (Chinese) | **56.11** | 51.50 |
56
+ | AI2D (With Mask) | **84.16** | 82.58 |
57
+ | AI2D (Without Mask) | **94.11** | 93.36 |
58
+ | CV-Bench | **80.82** | 79.95 |
59
+ | VL-RewardBench | 45.90 | **49.65** |
60
+ | V* | **78.01** | 76.96 |
61
+ | PixmoCount | 62.19 | **63.33** |
62
+ | CountBench | **88.19** | 86.35 |
63
+ | ChartQA | **86.48** | 84.08 |
64
+ | CharXiv (Direct Questions) | **74.10** | 69.80 |
65
+ | DocVQA (Test) | **95.00** | 94.93 |
66
+ | InfoVQA (Test) | 78.42 | **81.67** |
67
+ | WeMath | **33.62** | 33.33 |
68
+ | MathVista (Mini) | **69.57** | 68.60 |
69
+ | MathVision | **25.56** | 22.37 |
70
+ | MMStar | **67.72** | 62.54 |
71
+ | SEED-Bench (Image) | 77.32 | **77.53** |
72
+ | ScienceQA | **94.98** | 88.75 |
73
+ | SEED-Bench 2-Plus | 69.21 | **70.93** |
74
+ | OCRBench | 82.90 | **84.20** |
75
+ | RealWorldQA | 68.10 | **68.50** |
76
 
77
  ### Using 🤗 Transformers to Chat
78
  Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`: