modernbert-base-multi-head-values-context

This model is a fine-tuned version of answerdotai/ModernBERT-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2990
Subset Accuracy: 0.2753
F1 Macro: 0.3032
F1 Micro: 0.3876
Precision Macro: 0.4109
Recall Macro: 0.2499
Roc Auc: 0.7910

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 2025
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.01
num_epochs: 33
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Subset Accuracy	F1 Macro	F1 Micro	Precision Macro	Recall Macro	Roc Auc
2.5451	0.5002	767	0.2012	0.0027	0.0023	0.0050	0.0718	0.0012	0.6531
1.5075	1.0	1534	0.1838	0.0768	0.0569	0.1319	0.2330	0.0353	0.7437
1.4382	1.5002	2301	0.1781	0.1437	0.1318	0.2281	0.3534	0.0891	0.7792
1.3858	2.0	3068	0.1710	0.1680	0.1582	0.2615	0.4338	0.1091	0.7962
1.3157	2.5002	3835	0.1681	0.1822	0.1787	0.2796	0.4967	0.1267	0.8058
1.291	3.0	4602	0.1622	0.2229	0.2115	0.3291	0.6302	0.1523	0.8195
1.2388	3.5002	5369	0.1614	0.2026	0.2201	0.3082	0.6143	0.1536	0.8222
1.1993	4.0	6136	0.1583	0.2445	0.2454	0.3554	0.5956	0.1783	0.8291
1.1415	4.5002	6903	0.1608	0.2793	0.2883	0.3934	0.5614	0.2220	0.8288
1.1221	5.0	7670	0.1595	0.2384	0.2523	0.3533	0.5982	0.1761	0.8342
1.0726	5.5002	8437	0.1604	0.2727	0.2930	0.3906	0.5584	0.2178	0.8318
1.0381	6.0	9204	0.1629	0.2599	0.2693	0.3759	0.5421	0.2099	0.8315
0.9957	6.5002	9971	0.1662	0.2814	0.2856	0.4001	0.5380	0.2223	0.8300
0.9319	7.0	10738	0.1640	0.2604	0.2960	0.3820	0.5431	0.2201	0.8288
0.8279	7.5002	11505	0.1733	0.2788	0.2953	0.3939	0.5275	0.2307	0.8245
0.8365	8.0	12272	0.1742	0.2757	0.3004	0.3910	0.5030	0.2339	0.8218
0.7168	8.5002	13039	0.1810	0.2863	0.3063	0.4020	0.4589	0.2499	0.8202
0.7158	9.0	13806	0.1804	0.2758	0.3052	0.3910	0.4622	0.2392	0.8212
0.5827	9.5002	14573	0.1880	0.2878	0.3166	0.4034	0.4568	0.2584	0.8159
0.5958	10.0	15340	0.1906	0.2788	0.3114	0.3940	0.4912	0.2522	0.8134
0.4641	10.5002	16107	0.1978	0.2750	0.3104	0.3896	0.4505	0.2501	0.8106
0.4608	11.0	16874	0.2022	0.2724	0.3026	0.3880	0.4840	0.2470	0.8082
0.3546	11.5002	17641	0.2113	0.2773	0.3120	0.3922	0.4598	0.2556	0.8038
0.3575	12.0	18408	0.2133	0.2834	0.3092	0.3980	0.4361	0.2535	0.8045
0.2601	12.5002	19175	0.2226	0.2778	0.3104	0.3897	0.4274	0.2559	0.8003
0.258	13.0	19942	0.2275	0.2824	0.3176	0.3956	0.4188	0.2643	0.8003
0.1778	13.5002	20709	0.2375	0.2686	0.3035	0.3815	0.4103	0.2496	0.7994
0.1803	14.0	21476	0.2426	0.2713	0.3083	0.3865	0.4305	0.2522	0.7968
0.1233	14.5002	22243	0.2501	0.2781	0.3139	0.3906	0.4473	0.2592	0.7970
0.1197	15.0	23010	0.2566	0.2735	0.3081	0.3864	0.4231	0.2519	0.7950
0.0804	15.5002	23777	0.2653	0.2746	0.3065	0.3839	0.4267	0.2512	0.7941
0.0813	16.0	24544	0.2723	0.2740	0.3078	0.3861	0.4372	0.2505	0.7931
0.0548	16.5002	25311	0.2813	0.2776	0.3077	0.3922	0.4544	0.2500	0.7927
0.0535	17.0	26078	0.2882	0.2804	0.3093	0.3912	0.4497	0.2528	0.7914
0.0387	17.5002	26845	0.2990	0.2753	0.3032	0.3876	0.4109	0.2499	0.7910

Framework versions

Transformers 4.53.2
Pytorch 2.6.0+cu124
Datasets 2.14.4
Tokenizers 0.21.2

Downloads last month: 4

Safetensors

Model size

150M params

Tensor type

F32

Model tree for DayCardoso/modernbert-base-multi-head-values-context

Base model

answerdotai/ModernBERT-base

Finetuned

(733)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard