File size: 16,570 Bytes
68fe11e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
INFO: 2024-10-17 07:12:53,947: llmtf.base.evaluator: Starting eval on ['darumeru/multiq']
INFO: 2024-10-17 07:12:53,947: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:12:53,947: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:13:01,539: llmtf.base.darumeru/MultiQ: Loading Dataset: 7.59s
INFO: 2024-10-17 07:18:20,829: llmtf.base.darumeru/MultiQ: Processing Dataset: 319.29s
INFO: 2024-10-17 07:18:20,829: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-17 07:18:20,830: llmtf.base.darumeru/MultiQ: {'f1': 0.3485719410941241, 'em': 0.24282982791587}
INFO: 2024-10-17 07:18:20,835: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:18:20,835: llmtf.base.evaluator:
mean darumeru/MultiQ
0.296 0.296
INFO: 2024-10-17 07:18:30,261: llmtf.base.evaluator: Starting eval on ['darumeru/parus']
INFO: 2024-10-17 07:18:30,261: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:18:30,261: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:18:34,809: llmtf.base.darumeru/PARus: Loading Dataset: 4.55s
INFO: 2024-10-17 07:18:39,184: llmtf.base.darumeru/PARus: Processing Dataset: 4.37s
INFO: 2024-10-17 07:18:39,184: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-17 07:18:39,194: llmtf.base.darumeru/PARus: {'acc': 0.68}
INFO: 2024-10-17 07:18:39,194: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:18:39,195: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus
0.488 0.296 0.680
INFO: 2024-10-17 07:18:48,257: llmtf.base.evaluator: Starting eval on ['darumeru/rcb']
INFO: 2024-10-17 07:18:48,258: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:18:48,258: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:18:52,169: llmtf.base.darumeru/RCB: Loading Dataset: 3.91s
INFO: 2024-10-17 07:18:57,742: llmtf.base.darumeru/RCB: Processing Dataset: 5.57s
INFO: 2024-10-17 07:18:57,742: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-17 07:18:57,745: llmtf.base.darumeru/RCB: {'acc': 0.5272727272727272, 'f1_macro': 0.47584611730940257}
INFO: 2024-10-17 07:18:57,746: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:18:57,747: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB
0.492 0.296 0.680 0.502
INFO: 2024-10-17 07:19:07,388: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa']
INFO: 2024-10-17 07:19:07,388: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:19:07,388: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:19:13,124: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 5.74s
INFO: 2024-10-17 07:20:12,666: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 59.54s
INFO: 2024-10-17 07:20:12,666: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-17 07:20:12,678: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7207903780068728, 'f1_macro': 0.7206838429510474}
INFO: 2024-10-17 07:20:12,689: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:20:12,690: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA
0.549 0.296 0.680 0.502 0.721
INFO: 2024-10-17 07:20:21,945: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree']
INFO: 2024-10-17 07:20:21,945: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:20:21,945: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:20:25,640: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 3.69s
INFO: 2024-10-17 07:20:28,309: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.67s
INFO: 2024-10-17 07:20:28,310: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-17 07:20:28,312: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8952380952380953, 'f1_macro': 0.8944916936662219}
INFO: 2024-10-17 07:20:28,313: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:20:28,314: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree
0.619 0.296 0.680 0.502 0.721 0.895
INFO: 2024-10-17 07:20:37,966: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd']
INFO: 2024-10-17 07:20:37,967: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:20:37,967: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:20:42,582: llmtf.base.darumeru/RWSD: Loading Dataset: 4.62s
INFO: 2024-10-17 07:20:47,988: llmtf.base.darumeru/RWSD: Processing Dataset: 5.41s
INFO: 2024-10-17 07:20:47,988: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-17 07:20:47,989: llmtf.base.darumeru/RWSD: {'acc': 0.5343137254901961}
INFO: 2024-10-17 07:20:47,989: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:20:47,990: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree
0.605 0.296 0.680 0.502 0.534 0.721 0.895
INFO: 2024-10-17 07:20:57,317: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-17 07:20:57,317: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:20:57,317: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:21:13,664: llmtf.base.daru/treewayextractive: Loading Dataset: 16.35s
INFO: 2024-10-17 07:24:01,803: llmtf.base.daru/treewayextractive: Processing Dataset: 168.14s
INFO: 2024-10-17 07:24:01,803: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-17 07:24:02,038: llmtf.base.daru/treewayextractive: {'r-prec': 0.3983020202020202}
INFO: 2024-10-17 07:24:02,084: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:24:02,085: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree
0.575 0.398 0.296 0.680 0.502 0.534 0.721 0.895
INFO: 2024-10-17 07:24:11,344: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-17 07:24:11,345: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:24:11,345: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:29:12,497: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 301.15s
INFO: 2024-10-17 07:35:18,210: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 365.71s
INFO: 2024-10-17 07:35:18,210: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-17 07:35:18,279: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.330000
anatomy 0.422222
astronomy 0.625000
business_ethics 0.580000
clinical_knowledge 0.592453
college_biology 0.506944
college_chemistry 0.340000
college_computer_science 0.540000
college_mathematics 0.370000
college_medicine 0.549133
college_physics 0.431373
computer_security 0.570000
conceptual_physics 0.536170
econometrics 0.385965
electrical_engineering 0.531034
elementary_mathematics 0.515873
formal_logic 0.333333
global_facts 0.390000
high_school_biology 0.670968
high_school_chemistry 0.487685
high_school_computer_science 0.660000
high_school_european_history 0.733333
high_school_geography 0.696970
high_school_government_and_politics 0.569948
high_school_macroeconomics 0.523077
high_school_mathematics 0.429630
high_school_microeconomics 0.521008
high_school_physics 0.443709
high_school_psychology 0.706422
high_school_statistics 0.523148
high_school_us_history 0.642157
high_school_world_history 0.729958
human_aging 0.587444
human_sexuality 0.641221
international_law 0.694215
jurisprudence 0.638889
logical_fallacies 0.533742
machine_learning 0.419643
management 0.650485
marketing 0.726496
medical_genetics 0.550000
miscellaneous 0.629630
moral_disputes 0.575145
moral_scenarios 0.248045
nutrition 0.614379
philosophy 0.643087
prehistory 0.546296
professional_accounting 0.358156
professional_law 0.373533
professional_medicine 0.500000
professional_psychology 0.495098
public_relations 0.500000
security_studies 0.665306
sociology 0.701493
us_foreign_policy 0.700000
virology 0.433735
world_religions 0.672515
INFO: 2024-10-17 07:35:18,289: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.496176
humanities 0.566481
other (business, health, misc.) 0.541724
social sciences 0.592209
INFO: 2024-10-17 07:35:18,294: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.549147460511024}
INFO: 2024-10-17 07:35:18,341: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:35:18,343: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU
0.572 0.398 0.296 0.680 0.502 0.534 0.721 0.895 0.549
INFO: 2024-10-17 07:35:27,953: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-17 07:35:27,953: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:35:27,953: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:37:30,758: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 122.80s
INFO: 2024-10-17 07:43:10,625: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 339.87s
INFO: 2024-10-17 07:43:10,626: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-17 07:43:10,691: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.370000
anatomy 0.622222
astronomy 0.697368
business_ethics 0.670000
clinical_knowledge 0.709434
college_biology 0.701389
college_chemistry 0.450000
college_computer_science 0.570000
college_mathematics 0.360000
college_medicine 0.670520
college_physics 0.480392
computer_security 0.720000
conceptual_physics 0.655319
econometrics 0.500000
electrical_engineering 0.565517
elementary_mathematics 0.539683
formal_logic 0.357143
global_facts 0.370000
high_school_biology 0.800000
high_school_chemistry 0.561576
high_school_computer_science 0.670000
high_school_european_history 0.763636
high_school_geography 0.772727
high_school_government_and_politics 0.849741
high_school_macroeconomics 0.679487
high_school_mathematics 0.440741
high_school_microeconomics 0.756303
high_school_physics 0.450331
high_school_psychology 0.849541
high_school_statistics 0.643519
high_school_us_history 0.813725
high_school_world_history 0.835443
human_aging 0.695067
human_sexuality 0.763359
international_law 0.768595
jurisprudence 0.787037
logical_fallacies 0.779141
machine_learning 0.464286
management 0.805825
marketing 0.884615
medical_genetics 0.750000
miscellaneous 0.784163
moral_disputes 0.650289
moral_scenarios 0.270391
nutrition 0.718954
philosophy 0.717042
prehistory 0.737654
professional_accounting 0.496454
professional_law 0.458931
professional_medicine 0.672794
professional_psychology 0.668301
public_relations 0.681818
security_studies 0.718367
sociology 0.810945
us_foreign_policy 0.790000
virology 0.487952
world_religions 0.812865
INFO: 2024-10-17 07:43:10,700: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.563340
humanities 0.673223
other (business, health, misc.) 0.667000
social sciences 0.736716
INFO: 2024-10-17 07:43:10,705: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6600696360837558}
INFO: 2024-10-17 07:43:10,741: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:43:10,743: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.582 0.398 0.296 0.680 0.502 0.534 0.721 0.895 0.660 0.549
INFO: 2024-10-17 07:43:20,115: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-17 07:43:20,115: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:43:20,115: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:43:24,372: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.26s
INFO: 2024-10-17 07:47:01,407: llmtf.base.daru/treewayabstractive: Processing Dataset: 217.03s
INFO: 2024-10-17 07:47:01,407: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-17 07:47:01,408: llmtf.base.daru/treewayabstractive: {'rouge1': 0.32720307606797727, 'rouge2': 0.10857945570692258}
INFO: 2024-10-17 07:47:01,409: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:47:01,410: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.545 0.218 0.398 0.296 0.680 0.502 0.534 0.721 0.895 0.660 0.549
INFO: 2024-10-17 07:47:10,811: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru']
INFO: 2024-10-17 07:47:10,811: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-17 07:47:10,811: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-17 07:47:15,676: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.86s
INFO: 2024-10-17 07:49:51,029: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 155.35s
INFO: 2024-10-17 07:49:51,030: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-17 07:49:51,031: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.76859951568896, 'len': 0.9950709951674359, 'lcs': 0.9}
INFO: 2024-10-17 07:49:51,031: llmtf.base.evaluator: Ended eval
INFO: 2024-10-17 07:49:51,032: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU
0.578 0.218 0.398 0.296 0.680 0.502 0.534 0.900 0.721 0.895 0.660 0.549
|