| INFO: 2024-11-15 17:16:28,761: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] | |
| INFO: 2024-11-15 17:16:28,761: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:16:28,761: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:16:31,122: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] | |
| INFO: 2024-11-15 17:16:31,123: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:16:31,123: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:16:33,057: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.30s | |
| INFO: 2024-11-15 17:16:33,199: llmtf.base.darumeru/PARus: Loading Dataset: 2.08s | |
| INFO: 2024-11-15 17:16:35,422: llmtf.base.darumeru/PARus: Processing Dataset: 2.22s | |
| INFO: 2024-11-15 17:16:35,422: llmtf.base.darumeru/PARus: Results for darumeru/PARus: | |
| INFO: 2024-11-15 17:16:35,431: llmtf.base.darumeru/PARus: {'acc': 0.28} | |
| INFO: 2024-11-15 17:16:35,432: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:16:35,432: llmtf.base.evaluator: | |
| mean darumeru/PARus | |
| 0.280 0.280 | |
| INFO: 2024-11-15 17:16:43,944: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] | |
| INFO: 2024-11-15 17:16:43,944: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:16:43,944: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:16:47,099: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.16s | |
| INFO: 2024-11-15 17:17:11,239: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 24.14s | |
| INFO: 2024-11-15 17:17:11,239: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: | |
| INFO: 2024-11-15 17:17:11,250: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.47465635738831613, 'f1_macro': 0.4733795691558009} | |
| INFO: 2024-11-15 17:17:11,257: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:17:11,258: llmtf.base.evaluator: | |
| mean darumeru/PARus darumeru/ruOpenBookQA | |
| 0.377 0.280 0.474 | |
| INFO: 2024-11-15 17:17:20,171: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] | |
| INFO: 2024-11-15 17:17:20,171: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:17:20,171: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:17:22,512: llmtf.base.darumeru/RWSD: Loading Dataset: 2.34s | |
| INFO: 2024-11-15 17:17:25,027: llmtf.base.darumeru/RWSD: Processing Dataset: 2.51s | |
| INFO: 2024-11-15 17:17:25,028: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: | |
| INFO: 2024-11-15 17:17:25,028: llmtf.base.darumeru/RWSD: {'acc': 0.4362745098039216} | |
| INFO: 2024-11-15 17:17:25,029: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:17:25,029: llmtf.base.evaluator: | |
| mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA | |
| 0.397 0.280 0.436 0.474 | |
| INFO: 2024-11-15 17:17:33,677: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] | |
| INFO: 2024-11-15 17:17:33,677: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:17:33,678: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:19:15,640: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 101.96s | |
| INFO: 2024-11-15 17:22:11,674: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 176.03s | |
| INFO: 2024-11-15 17:22:11,674: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: | |
| INFO: 2024-11-15 17:22:11,735: llmtf.base.nlpcoreteam/ruMMLU: metric | |
| subject | |
| abstract_algebra 0.270000 | |
| anatomy 0.392593 | |
| astronomy 0.421053 | |
| business_ethics 0.410000 | |
| clinical_knowledge 0.426415 | |
| college_biology 0.340278 | |
| college_chemistry 0.200000 | |
| college_computer_science 0.330000 | |
| college_mathematics 0.300000 | |
| college_medicine 0.358382 | |
| college_physics 0.294118 | |
| computer_security 0.540000 | |
| conceptual_physics 0.331915 | |
| econometrics 0.359649 | |
| electrical_engineering 0.496552 | |
| elementary_mathematics 0.399471 | |
| formal_logic 0.269841 | |
| global_facts 0.390000 | |
| high_school_biology 0.393548 | |
| high_school_chemistry 0.418719 | |
| high_school_computer_science 0.610000 | |
| high_school_european_history 0.478788 | |
| high_school_geography 0.525253 | |
| high_school_government_and_politics 0.388601 | |
| high_school_macroeconomics 0.335897 | |
| high_school_mathematics 0.366667 | |
| high_school_microeconomics 0.407563 | |
| high_school_physics 0.331126 | |
| high_school_psychology 0.436697 | |
| high_school_statistics 0.356481 | |
| high_school_us_history 0.401961 | |
| high_school_world_history 0.535865 | |
| human_aging 0.403587 | |
| human_sexuality 0.412214 | |
| international_law 0.652893 | |
| jurisprudence 0.500000 | |
| logical_fallacies 0.374233 | |
| machine_learning 0.366071 | |
| management 0.456311 | |
| marketing 0.645299 | |
| medical_genetics 0.440000 | |
| miscellaneous 0.429119 | |
| moral_disputes 0.447977 | |
| moral_scenarios 0.242458 | |
| nutrition 0.408497 | |
| philosophy 0.459807 | |
| prehistory 0.398148 | |
| professional_accounting 0.329787 | |
| professional_law 0.327249 | |
| professional_medicine 0.294118 | |
| professional_psychology 0.370915 | |
| public_relations 0.436364 | |
| security_studies 0.416327 | |
| sociology 0.562189 | |
| us_foreign_policy 0.610000 | |
| virology 0.373494 | |
| world_religions 0.456140 | |
| INFO: 2024-11-15 17:22:11,743: llmtf.base.nlpcoreteam/ruMMLU: metric | |
| subject | |
| STEM 0.375889 | |
| humanities 0.426566 | |
| other (business, health, misc.) 0.411257 | |
| social sciences 0.438472 | |
| INFO: 2024-11-15 17:22:11,748: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.41304613612768604} | |
| INFO: 2024-11-15 17:22:11,782: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:22:11,783: llmtf.base.evaluator: | |
| mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU | |
| 0.401 0.280 0.436 0.474 0.413 | |
| INFO: 2024-11-15 17:22:20,448: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] | |
| INFO: 2024-11-15 17:22:20,448: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:22:20,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:22:24,310: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.86s | |
| INFO: 2024-11-15 17:24:23,418: llmtf.base.darumeru/MultiQ: Processing Dataset: 470.36s | |
| INFO: 2024-11-15 17:24:23,419: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: | |
| INFO: 2024-11-15 17:24:23,420: llmtf.base.darumeru/MultiQ: {'f1': 0.22938942868828982, 'em': 0.13479923518164436} | |
| INFO: 2024-11-15 17:24:23,424: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:24:23,425: llmtf.base.evaluator: | |
| mean darumeru/MultiQ darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU | |
| 0.357 0.182 0.280 0.436 0.474 0.413 | |
| INFO: 2024-11-15 17:24:32,018: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] | |
| INFO: 2024-11-15 17:24:32,018: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:24:32,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:24:34,515: llmtf.base.darumeru/RCB: Loading Dataset: 2.50s | |
| INFO: 2024-11-15 17:24:37,191: llmtf.base.darumeru/RCB: Processing Dataset: 2.68s | |
| INFO: 2024-11-15 17:24:37,191: llmtf.base.darumeru/RCB: Results for darumeru/RCB: | |
| INFO: 2024-11-15 17:24:37,195: llmtf.base.darumeru/RCB: {'acc': 0.4636363636363636, 'f1_macro': 0.4278154677497561} | |
| INFO: 2024-11-15 17:24:37,195: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:24:37,196: llmtf.base.evaluator: | |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU | |
| 0.372 0.182 0.280 0.446 0.436 0.474 0.413 | |
| INFO: 2024-11-15 17:24:45,714: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] | |
| INFO: 2024-11-15 17:24:45,714: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:24:45,714: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:24:48,302: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.59s | |
| INFO: 2024-11-15 17:24:49,696: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 1.39s | |
| INFO: 2024-11-15 17:24:49,696: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: | |
| INFO: 2024-11-15 17:24:49,699: llmtf.base.darumeru/ruWorldTree: {'acc': 0.6571428571428571, 'f1_macro': 0.6549941370855985} | |
| INFO: 2024-11-15 17:24:49,699: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:24:49,700: llmtf.base.evaluator: | |
| mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU | |
| 0.412 0.182 0.280 0.446 0.436 0.474 0.656 0.413 | |
| INFO: 2024-11-15 17:24:58,281: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] | |
| INFO: 2024-11-15 17:24:58,282: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:24:58,282: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:25:11,202: llmtf.base.daru/treewayextractive: Loading Dataset: 12.92s | |
| INFO: 2024-11-15 17:26:13,880: llmtf.base.daru/treewayabstractive: Processing Dataset: 229.57s | |
| INFO: 2024-11-15 17:26:13,880: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: | |
| INFO: 2024-11-15 17:26:13,881: llmtf.base.daru/treewayabstractive: {'rouge1': 0.31763876629967247, 'rouge2': 0.10272501116299726} | |
| INFO: 2024-11-15 17:26:13,882: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:26:13,883: llmtf.base.evaluator: | |
| mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU | |
| 0.387 0.210 0.182 0.280 0.446 0.436 0.474 0.656 0.413 | |
| INFO: 2024-11-15 17:26:58,894: llmtf.base.daru/treewayextractive: Processing Dataset: 107.69s | |
| INFO: 2024-11-15 17:26:58,894: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: | |
| INFO: 2024-11-15 17:26:59,122: llmtf.base.daru/treewayextractive: {'r-prec': 0.3720740981240981} | |
| INFO: 2024-11-15 17:26:59,162: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:26:59,164: llmtf.base.evaluator: | |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU | |
| 0.385 0.210 0.372 0.182 0.280 0.446 0.436 0.474 0.656 0.413 | |
| INFO: 2024-11-15 17:27:07,864: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] | |
| INFO: 2024-11-15 17:27:07,864: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:27:07,864: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:28:52,751: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 104.89s | |
| INFO: 2024-11-15 17:31:30,875: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 158.12s | |
| INFO: 2024-11-15 17:31:30,875: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: | |
| INFO: 2024-11-15 17:31:30,937: llmtf.base.nlpcoreteam/enMMLU: metric | |
| subject | |
| abstract_algebra 0.360000 | |
| anatomy 0.525926 | |
| astronomy 0.480263 | |
| business_ethics 0.510000 | |
| clinical_knowledge 0.581132 | |
| college_biology 0.500000 | |
| college_chemistry 0.280000 | |
| college_computer_science 0.390000 | |
| college_mathematics 0.290000 | |
| college_medicine 0.485549 | |
| college_physics 0.294118 | |
| computer_security 0.640000 | |
| conceptual_physics 0.459574 | |
| econometrics 0.394737 | |
| electrical_engineering 0.572414 | |
| elementary_mathematics 0.428571 | |
| formal_logic 0.333333 | |
| global_facts 0.350000 | |
| high_school_biology 0.567742 | |
| high_school_chemistry 0.477833 | |
| high_school_computer_science 0.660000 | |
| high_school_european_history 0.612121 | |
| high_school_geography 0.601010 | |
| high_school_government_and_politics 0.580311 | |
| high_school_macroeconomics 0.453846 | |
| high_school_mathematics 0.325926 | |
| high_school_microeconomics 0.537815 | |
| high_school_physics 0.350993 | |
| high_school_psychology 0.645872 | |
| high_school_statistics 0.435185 | |
| high_school_us_history 0.549020 | |
| high_school_world_history 0.649789 | |
| human_aging 0.533632 | |
| human_sexuality 0.526718 | |
| international_law 0.669421 | |
| jurisprudence 0.564815 | |
| logical_fallacies 0.607362 | |
| machine_learning 0.366071 | |
| management 0.621359 | |
| marketing 0.760684 | |
| medical_genetics 0.450000 | |
| miscellaneous 0.583653 | |
| moral_disputes 0.531792 | |
| moral_scenarios 0.242458 | |
| nutrition 0.549020 | |
| philosophy 0.517685 | |
| prehistory 0.540123 | |
| professional_accounting 0.382979 | |
| professional_law 0.362451 | |
| professional_medicine 0.345588 | |
| professional_psychology 0.449346 | |
| public_relations 0.509091 | |
| security_studies 0.555102 | |
| sociology 0.641791 | |
| us_foreign_policy 0.680000 | |
| virology 0.427711 | |
| world_religions 0.619883 | |
| INFO: 2024-11-15 17:31:30,945: llmtf.base.nlpcoreteam/enMMLU: metric | |
| subject | |
| STEM 0.437705 | |
| humanities 0.523096 | |
| other (business, health, misc.) 0.507659 | |
| social sciences 0.547970 | |
| INFO: 2024-11-15 17:31:30,950: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.5041077125064222} | |
| INFO: 2024-11-15 17:31:30,983: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:31:30,985: llmtf.base.evaluator: | |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU | |
| 0.397 0.210 0.372 0.182 0.280 0.446 0.436 0.474 0.656 0.504 0.413 | |
| INFO: 2024-11-15 17:31:39,751: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] | |
| INFO: 2024-11-15 17:31:39,751: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] | |
| INFO: 2024-11-15 17:31:39,751: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] | |
| INFO: 2024-11-15 17:31:42,286: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.53s | |
| INFO: 2024-11-15 17:34:56,464: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 194.18s | |
| INFO: 2024-11-15 17:34:56,464: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: | |
| INFO: 2024-11-15 17:34:56,465: llmtf.base.darumeru/cp_para_ru: {'tokens_per_word': 1.902600722678228, 'symbol_per_token': 3.932343331088908, 'len': 0.967827425390648, 'lcs': 0.23} | |
| INFO: 2024-11-15 17:34:56,466: llmtf.base.evaluator: Ended eval | |
| INFO: 2024-11-15 17:34:56,466: llmtf.base.evaluator: | |
| mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU | |
| 0.382 0.210 0.372 0.182 0.280 0.446 0.436 0.230 0.474 0.656 0.504 0.413 | |