kazukifujii commited on
Commit
bb0e62d
·
verified ·
1 Parent(s): fce02b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -30
README.md CHANGED
@@ -70,11 +70,11 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
70
  | Llama 3.1 Swallow 70B Instruct | 0.5252| 0.7846| 0.7086| 0.5063| 0.6979| 0.6888| 0.6402| 0.6653| 0.6521|
71
  | Llama 3.3 Swallow 70B Instruct | 0.5193| 0.7750| 0.7213| 0.5228| 0.6721| 0.7407| 0.6386| 0.7043| 0.6618|
72
  | Llama 3.1 Swallow 70B Instruct v0.1| 0.5676| 0.7859| 0.7490| 0.5437| 0.6383| 0.6870| 0.6121| 0.6540| 0.6547|
73
- | Llama 3.1 Swallow 70B Instruct v0.3 | 0.6063| 0.8052| 0.8410| 0.5591| 0.6280| 0.7774| 0.6920| 0.7832| 0.7115|
74
- | Qwen2-72B-Instruct |0.5699| 0.7858| 0.8222| 0.5096| 0.7032| 0.7963| 0.7728| 0.8223| 0.7228|
75
- | Qwen2.5-72B-Instruct |0.7060| 0.7866| 0.8122| 0.6968| 0.6536| 0.8301| 0.8060| 0.7841| 0.7594|
76
  | GPT-3.5 (gpt-3.5-turbo-0125) | 0.6851|0.7641| 0.7414| 0.5522| 0.5128| 0.7104| 0.6266| 0.7361| 0.6661|
77
- | GPT-4o (gpt-4o-2024-05-13) | 0.7296| 0.8540| 0.8646| 0.6641| 0.6661| 0.8274| 0.8184| 0.8085| 0.7791|
78
 
79
  ### Japanese tasks
80
 
@@ -82,19 +82,7 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
82
  |---|---|---|---|---|---|---|---|---|---|---|---|
83
  | |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| |
84
  | |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| |
85
- | RakutenAI-7B-chat | 0.9035 | 0.2600 | 0.4619 | 0.8647 | 0.1339 | 0.2120 | 0.2667 | 0.1966 | 0.4504 | 0.2299 | 0.3980 |
86
- | Qwen2-7B-Instruct | 0.8856 | 0.3902 | 0.3859 | 0.8967 | 0.1277 | 0.5720 | 0.2041 | 0.1909 | 0.5713 | **0.5683** | 0.4793 |
87
- | Qwen2.5-7B-Instruct | 0.9151 | 0.4293 | 0.3910 | 0.8908 | 0.1676 | **0.6240** | 0.2108 | 0.1916 | **0.6252** | 0.5305 | 0.4976 |
88
- | Tanuki-8B-dpo-v1.0 | 0.2770 | 0.2937 | 0.3710 | 0.6669 | 0.1016 | 0.4280 | 0.2385 | 0.1820 | 0.3078 | 0.2555 | 0.3122 |
89
- | Llama 3 8B Instruct | 0.8785 | 0.3812 | 0.3936 | 0.8955 | 0.1273 | 0.4160 | 0.2143 | 0.2035 | 0.4719 | 0.2872 | 0.4269 |
90
- | Llama 3.1 8B Instruct | 0.8829 | 0.4272 | 0.4112 | 0.8856 | 0.1481 | 0.5280 | 0.2174 | 0.1990 | 0.5086 | 0.4976 | 0.4706 |
91
- | Llama 3 Youko 8B Instruct | 0.9196 | 0.4850 | 0.5178 | 0.9001 | 0.2085 | 0.4680 | 0.2559 | 0.1906 | 0.4691 | 0.2695 | 0.4684 |
92
- | Llama-3-ELYZA-JP-8B | 0.9017 | 0.5124 | 0.5016 | 0.9113 | 0.1677 | 0.4600 | 0.2509 | 0.1846 | 0.4829 | 0.3811 | 0.4754 |
93
- | Llama 3 heron brain 8B v0.3 | 0.9231 | 0.4933 | 0.5694 | 0.9056 | **0.2178** | 0.4560 | 0.2771 | 0.2168 | 0.4993 | 0.3177 | 0.4876 |
94
- | Llama 3 Swallow 8B Instruct | 0.9178 | 0.4963 | 0.5168 | 0.9088 | 0.1296 | 0.4880 | 0.2522 | 0.2254 | 0.4835 | 0.3927 | 0.4811 |
95
- | Llama 3.1 Swallow 8B Instruct v0.1| 0.9240 | **0.5874** | 0.5736 | **0.9170** | 0.1380 | 0.5080 | 0.2820 | **0.2282** | 0.5301 | 0.3665 | 0.5055 |
96
- | Llama 3.1 Swallow 8B Instruct v0.2| **0.9294** | 0.5601 | **0.5988** | 0.9148 | 0.1372 | 0.5280 | **0.2878** | 0.2270 | 0.5504 | 0.4079 | **0.5141** |
97
- | Llama 3.1 Swallow 8B Instruct v0.3 |0.9240 | 0.5174 | 0.5825 | 0.8954 | 0.1902 | 0.5480 | 0.2809 | 0.2278 | 0.5445 | 0.3945| 0.5105 |
98
 
99
 
100
  ### English tasks
@@ -103,19 +91,7 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
103
  |---|---|---|---|---|---|---|---|---|---|---|
104
  | |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot| |
105
  | |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1| |
106
- | RakutenAI-7B-chat | 0.4160 | 0.5971 | **0.6465** | 0.3091 | 0.8886 | 0.5757 | 0.3139 | 0.4958 | 0.2671 | 0.5011 |
107
- | Qwen2-7B-Instruct | 0.4000 | 0.5468 | 0.6146 | 0.3518 | 0.8852 | 0.7073 | 0.6300 | 0.3101 | 0.6354 | 0.5646 |
108
- | Qwen2.5-7B-Instruct | **0.4280** | 0.5187 | 0.6240 | 0.2626 | 0.8761 | **0.7419** | 0.7415 | 0.2150 | **0.6360** | 0.5604 |
109
- | Tanuki-8B-dpo-v1.0 | 0.3340 | 0.2838 | 0.4696 | 0.2395 | 0.8168 | 0.3772 | 0.4867 | 0.3350 | 0.2805 | 0.4026 |
110
- | Llama 3 8B Instruct | 0.3880 | 0.6687 | 0.5834 | 0.3743 | 0.8903 | 0.6567 | **0.7453** | 0.6478 | 0.5415 | 0.6107 |
111
- | Llama 3.1 8B Instruct | 0.3700 | **0.6994** | 0.5920 | **0.3783** | **0.9037** | 0.6809 | 0.7430 | **0.6928** | 0.6293 | **0.6321** |
112
- | Llama 3 Youko 8B Instruct | 0.4080 | 0.6129 | 0.5983 | 0.3370 | 0.8981 | 0.5964 | 0.5618 | 0.4012 | 0.2750 | 0.5209 |
113
- | Llama-3-ELYZA-JP-8B | 0.3200 | 0.5502 | 0.5224 | 0.3631 | 0.8809 | 0.5875 | 0.5701 | 0.3213 | 0.4604 | 0.5084 |
114
- | Llama 3 heron brain 8B v0.3 | 0.3580 | 0.6563 | 0.5686 | 0.3726 | 0.9002 | 0.6213 | 0.5777 | 0.6409 | 0.3720 | 0.5631 |
115
- | Llama 3 Swallow 8B Instruct | 0.3720 | 0.6557 | 0.5861 | 0.3648 | 0.9002 | 0.6315 | 0.5959 | 0.6391 | 0.4238 | 0.5743 |
116
- | Llama 3.1 Swallow 8B Instruct v0.1| 0.3900 | 0.6488 | 0.6151 | 0.3553 | 0.8912 | 0.6237 | 0.6050 | 0.6417 | 0.3787 | 0.5722 |
117
- | Llama 3.1 Swallow 8B Instruct v0.2| 0.3800 | 0.6252 | 0.6031 | 0.3667 | 0.8886 | 0.6346 | 0.6202 | 0.6487 | 0.4738 | 0.5823 |
118
- | Llama 3.1 Swallow 8B Instruct v0.3 |0.3920 | 0.6295 | 0.5937 | 0.3638 | 0.8830 | 0.6280 | 0.6149 | 0.6282 | 0.4457 | 0.5754 |
119
 
120
 
121
  ## Evaluation Benchmarks
 
70
  | Llama 3.1 Swallow 70B Instruct | 0.5252| 0.7846| 0.7086| 0.5063| 0.6979| 0.6888| 0.6402| 0.6653| 0.6521|
71
  | Llama 3.3 Swallow 70B Instruct | 0.5193| 0.7750| 0.7213| 0.5228| 0.6721| 0.7407| 0.6386| 0.7043| 0.6618|
72
  | Llama 3.1 Swallow 70B Instruct v0.1| 0.5676| 0.7859| 0.7490| 0.5437| 0.6383| 0.6870| 0.6121| 0.6540| 0.6547|
73
+ | **Llama 3.1 Swallow 70B Instruct v0.3** | 0.6063| 0.8052| 0.8410| 0.5591| 0.6280| 0.7774| 0.6920| 0.7832| 0.7115|
74
+ | Qwen2-72B-Instruct |0.5699| 0.7858| 0.8222| 0.5096| **0.7032**| 0.7963| 0.7728| **0.8223**| 0.7228|
75
+ | Qwen2.5-72B-Instruct |0.7060| 0.7866| 0.8122| 0.6968| 0.6536| **0.8301**| 0.8060| 0.7841| 0.7594|
76
  | GPT-3.5 (gpt-3.5-turbo-0125) | 0.6851|0.7641| 0.7414| 0.5522| 0.5128| 0.7104| 0.6266| 0.7361| 0.6661|
77
+ | GPT-4o (gpt-4o-2024-05-13) | **0.7296**| **0.8540**| **0.8646**| **0.6641**| 0.6661| 0.8274| **0.8184**| 0.8085| **0.7791**|
78
 
79
  ### Japanese tasks
80
 
 
82
  |---|---|---|---|---|---|---|---|---|---|---|---|
83
  | |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| |
84
  | |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| |
85
+ | Llama 3 Youko 70B Instruct | | | | | | | | | | | |
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
 
88
  ### English tasks
 
91
  |---|---|---|---|---|---|---|---|---|---|---|
92
  | |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot| |
93
  | |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1| |
94
+ | Llama 3 Youko 70B Instruct | | | | | | | | | | |
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
 
97
  ## Evaluation Benchmarks