Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -24,7 +24,6 @@ tags: | |
| 24 | 
             
            [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
         | 
| 25 | 
             
            [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)    [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
         | 
| 26 |  | 
| 27 | 
            -
             | 
| 28 | 
             
            
         | 
| 29 |  | 
| 30 | 
             
            ## Introduction
         | 
| @@ -53,49 +52,74 @@ InternVL 2.5 is a multimodal large language model series, featuring models of va | |
| 53 |  | 
| 54 | 
             
            ### Image Benchmarks
         | 
| 55 |  | 
| 56 | 
            -
            | Benchmark | 
| 57 | 
            -
             | 
| 58 | 
            -
            | MMMU (val) | 
| 59 | 
            -
            | MMMU (test) | 
| 60 | 
            -
            | MMMU-PRO (overall) | 
| 61 | 
            -
            | MathVista (mini) | 
| 62 | 
            -
            | MathVision (mini) | 
| 63 | 
            -
            | MathVision (full) | 
| 64 | 
            -
            | MathVerse (mini) | 
| 65 | 
            -
            | Olympiad Bench | 
| 66 | 
            -
            | AI2D (w / wo M) | 
| 67 | 
            -
            | ChartQA (test avg.) | 
| 68 | 
            -
            | TextVQA (val) | 
| 69 | 
            -
            | DocVQA (test) | 
| 70 | 
            -
            | InfoVQA (test) | 
| 71 | 
            -
            | OCR-Bench | 
| 72 | 
            -
            | SEED-2 Plus | 
| 73 | 
            -
            | CharXiv (RQ / DQ) | 
| 74 | 
            -
            | VCR-EN-Easy (EM / Jaccard) | - | 
| 75 | 
            -
            | BLINK (val) | 
| 76 | 
            -
            | Mantis Eval | 
| 77 | 
            -
            | MMIU | 
| 78 | 
            -
            | Muir Bench | 
| 79 | 
            -
            | MMT (val) | 
| 80 | 
            -
            | MIRB (avg.) | 
| 81 | 
            -
            | RealWorld QA | 
| 82 | 
            -
            | MME-RW (EN) | 
| 83 | 
            -
            | WildVision (win rate)| - | 
| 84 | 
            -
            | R-Bench | 
| 85 | 
            -
            | MME (sum) | 
| 86 | 
            -
            | MMB (EN / CN) | 
| 87 | 
            -
            | MMBv1.1 (EN) | 
| 88 | 
            -
            | MMVet (turbo) | 
| 89 | 
            -
            | MMVetv2 (0613) | 
| 90 | 
            -
            | MMStar | 
| 91 | 
            -
            | HallBench (avg.) | 
| 92 | 
            -
            | MMHal (score) | 
| 93 | 
            -
            | CRPE (relation) | 
| 94 | 
            -
            | POPE (avg.) | 
| 95 | 
            -
             | 
| 96 |  | 
| 97 | 
             
            ### Video Benchmarks
         | 
| 98 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 99 | 
             
            ### Multimodal Multilingual Understanding
         | 
| 100 |  | 
| 101 | 
             
            <table>
         | 
|  | |
| 24 | 
             
            [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
         | 
| 25 | 
             
            [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)    [\[📖 Documents\]](https://internvl.readthedocs.io/en/latest/)
         | 
| 26 |  | 
|  | |
| 27 | 
             
            
         | 
| 28 |  | 
| 29 | 
             
            ## Introduction
         | 
|  | |
| 52 |  | 
| 53 | 
             
            ### Image Benchmarks
         | 
| 54 |  | 
| 55 | 
            +
            | Benchmark                  | LLaVA-OneVision-0.5B | InternVL2.5-1B | Qwen2-VL-2B | Aquila-VL-2B | InternVL2.5-2B |
         | 
| 56 | 
            +
            |----------------------------|----------------------|----------------|-------------|--------------|----------------|
         | 
| 57 | 
            +
            | MMMU (val)                 | 31.4                 | 40.9           | 41.1        | 47.4         | 43.6           |
         | 
| 58 | 
            +
            | MMMU (test)                | -                    | 35.8           | -           | -            | 38.2           |
         | 
| 59 | 
            +
            | MMMU-PRO (overall)         | -                    | 19.4           | 21.2        | 26.2         | 23.7           |
         | 
| 60 | 
            +
            | MathVista (mini)           | 34.8                 | 43.2           | 43.0        | 59.0         | 51.3           |
         | 
| 61 | 
            +
            | MathVision (mini)          | -                    | 16.8           | 19.7        | 21.1         | 13.5           |
         | 
| 62 | 
            +
            | MathVision (full)          | -                    | 14.4           | 12.4        | 18.4         | 14.7           |
         | 
| 63 | 
            +
            | MathVerse (mini)           | 17.9                 | 28.0           | 21.0        | 26.2         | 30.6           |
         | 
| 64 | 
            +
            | Olympiad Bench             | -                    | 1.7            | -           | -            | 2.0            |
         | 
| 65 | 
            +
            | AI2D (w / wo M)            | 57.1 / -             | 69.3 / 77.8    | 74.7 / 84.6 | 75.0 / -     | 74.9 / 83.5    |
         | 
| 66 | 
            +
            | ChartQA (test avg.)        | 61.4                 | 75.9           | 73.5        | 76.5         | 79.2           |
         | 
| 67 | 
            +
            | TextVQA (val)              | -                    | 72.0           | 79.7        | 76.4         | 74.3           |
         | 
| 68 | 
            +
            | DocVQA (test)              | 70.0                 | 84.8           | 90.1        | 85.0         | 88.7           |
         | 
| 69 | 
            +
            | InfoVQA (test)             | 41.8                 | 56.0           | 65.5        | 58.3         | 60.9           |
         | 
| 70 | 
            +
            | OCR-Bench                  | 565                  | 785            | 809         | 772          | 804            |
         | 
| 71 | 
            +
            | SEED-2 Plus                | -                    | 59.0           | 62.4        | 63.0         | 60.9           |
         | 
| 72 | 
            +
            | CharXiv (RQ / DQ)          | -                    | 19.0 / 38.4    | -           | -            | 21.3 / 49.7    |
         | 
| 73 | 
            +
            | VCR-EN-Easy (EM / Jaccard) | -                    | 91.5 / 97.0    | 81.5 / -    | 70.0 / -     | 93.2 / 97.6    |
         | 
| 74 | 
            +
            | BLINK (val)                | 52.1                 | 42.0           | 44.4        |              | 44.0           |
         | 
| 75 | 
            +
            | Mantis Eval                | 39.6                 | 51.2           | -           | -            | 54.8           |
         | 
| 76 | 
            +
            | MMIU                       | -                    | 38.5           | -           | -            | 43.5           |
         | 
| 77 | 
            +
            | Muir Bench                 | 25.5                 | 29.9           | -           | -            | 40.6           |
         | 
| 78 | 
            +
            | MMT (val)                  | -                    | 50.3           | 55.1        | -            | 54.5           |
         | 
| 79 | 
            +
            | MIRB (avg.)                | -                    | 35.6           | -           | -            | 36.4           |
         | 
| 80 | 
            +
            | RealWorld QA               | 55.6                 | 57.5           | 62.6        | -            | 60.1           |
         | 
| 81 | 
            +
            | MME-RW (EN)                | -                    | 44.2           | -           | -            | 48.8           |
         | 
| 82 | 
            +
            | WildVision (win rate)      | -                    | 43.4           | -           | -            | 44.2           |
         | 
| 83 | 
            +
            | R-Bench                    | -                    | 59.0           | -           | -            | 62.2           |
         | 
| 84 | 
            +
            | MME (sum)                  | 1438.0               | 1950.5         | 1872.0      | -            | 2138.2         |
         | 
| 85 | 
            +
            | MMB (EN / CN)              | 61.6 / 55.5          | 70.7 / 66.3    | 74.9 / 73.5 | -            | 74.7 / 71.9    |
         | 
| 86 | 
            +
            | MMBv1.1 (EN)               | 59.6                 | 68.4           | 72.2        | -            | 72.2           |
         | 
| 87 | 
            +
            | MMVet (turbo)              | 32.2                 | 48.8           | 49.5        | -            | 60.8           |
         | 
| 88 | 
            +
            | MMVetv2 (0613)             | -                    | 43.2           | -           | -            | 52.3           |
         | 
| 89 | 
            +
            | MMStar                     | 37.7                 | 50.1           | 48.0        | -            | 53.7           |
         | 
| 90 | 
            +
            | HallBench (avg.)           | 27.9                 | 39.0           | 41.7        | -            | 42.6           |
         | 
| 91 | 
            +
            | MMHal (score)              | -                    | 2.49           | -           | -            | 2.94           |
         | 
| 92 | 
            +
            | CRPE (relation)            | -                    | 60.9           | -           | -            | 70.2           |
         | 
| 93 | 
            +
            | POPE (avg.)                | -                    | 89.9           | -           | -            | 90.6           |
         | 
|  | |
| 94 |  | 
| 95 | 
             
            ### Video Benchmarks
         | 
| 96 |  | 
| 97 | 
            +
            | Model Name                                  | Video-MME (wo / w sub)        | MVBench | MMBench-Video (val) | MLVU (M-Avg) | LongVideoBench (val total) | CG-Bench v1.1 (long / clue acc.)     |
         | 
| 98 | 
            +
            |---------------------------------------------|-------------|------|-------|-------|------|-------------|
         | 
| 99 | 
            +
            | **InternVL2.5-1B**                              | 50.3 / 52.3 | 64.3 | 1.36  | 57.3  | 47.9 | -           |
         | 
| 100 | 
            +
            | Qwen2-VL-2B          | 55.6 / 60.4 | 63.2 | -     | -     | -    | -           |
         | 
| 101 | 
            +
            | **InternVL2.5-2B**                              | 51.9 / 54.1 | 68.8 | 1.44  | 61.4  | 52.0 | -           |
         | 
| 102 | 
            +
            | **InternVL2.5-4B**                              | 62.3 / 63.6 | 71.6 | 1.73  | 68.3  | 55.2 | -           |
         | 
| 103 | 
            +
            | VideoChat2-HD        | 45.3 / 55.7 | 62.3 | 1.22  | 47.9  | -    | -           |
         | 
| 104 | 
            +
            | MiniCPM-V-2.6         | 60.9 / 63.6 | -    | 1.70  | -     | 54.9 | -           |
         | 
| 105 | 
            +
            | LLaVA-OneVision-7B     | 58.2 /  -  | 56.7 | -     | -     | -    | -           |
         | 
| 106 | 
            +
            | Qwen2-VL-7B          | 63.3 / 69.0 | 67.0 | 1.44  | -     | 55.6 | -           |
         | 
| 107 | 
            +
            | **InternVL2.5-8B**                              | 64.2 / 66.9 | 72.0 | 1.68  | 68.9  | 60.0 | -           |
         | 
| 108 | 
            +
            | **InternVL2.5-26B**                             | 66.9 / 69.2 | 75.2 | 1.86  | 72.3  | 59.9 | -           |
         | 
| 109 | 
            +
            | Oryx-1.5-32B                                | 67.3 / 74.9 | 70.1 | 1.52  | 72.3  | -    | -           |
         | 
| 110 | 
            +
            | VILA-1.5-40B             | 60.1 / 61.1 | -    | 1.61  | 56.7  | -    | -           |
         | 
| 111 | 
            +
            | **InternVL2.5-38B**                             | 70.7 / 73.1 | 74.4 | 1.82  | 75.3  | 63.3 | -           |
         | 
| 112 | 
            +
            | GPT-4V/4T             | 59.9 / 63.3 | 43.7 | 1.53  | 49.2  | 59.1 | -           |
         | 
| 113 | 
            +
            | GPT-4o-20240513                | 71.9 / 77.2 | -    | 1.63  | 64.6  | 66.7 | -           |
         | 
| 114 | 
            +
            | GPT-4o-20240806                | -           | -    | 1.87  | -     | -    | -           |
         | 
| 115 | 
            +
            | Gemini-1.5-Pro     | 75.0 / 81.3 | -    | 1.30  | -     | 64.0 | -           |
         | 
| 116 | 
            +
            | VideoLLaMA2-72B | 61.4 / 63.1 | 62.0 | -     | -     | -    | -           |
         | 
| 117 | 
            +
            | LLaVA-OneVision-72B    | 66.2 / 69.5 | 59.4 | -     | 66.4  | 61.3 | -           |
         | 
| 118 | 
            +
            | Qwen2-VL-72B         | 71.2 / 77.8 | 73.6 | 1.70  | -     | -    | 41.3 / 56.2 |
         | 
| 119 | 
            +
            | InternVL2-Llama3-76B     | 64.7 / 67.8 | 69.6 | 1.71  | 69.9  | 61.1 | -           |
         | 
| 120 | 
            +
            | **InternVL2.5-78B**                             | 72.1 / 74.0 | 76.4 | 1.97  | 75.7  | 63.6 | 42.2 / 58.5 |
         | 
| 121 | 
            +
             | 
| 122 | 
            +
             | 
| 123 | 
             
            ### Multimodal Multilingual Understanding
         | 
| 124 |  | 
| 125 | 
             
            <table>
         | 
