TBD-LLaMA-2B-Final-Direction-2B

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.8900

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 139
training_steps: 13966

Training results

Training Loss	Epoch	Step	Validation Loss
8.9472	0.0143	200	8.9381
6.7664	0.0286	400	6.7485
6.6429	0.0430	600	6.6299
6.5725	0.0573	800	6.5598
6.4746	0.0716	1000	6.4666
6.345	0.0859	1200	6.3290
6.1452	0.1002	1400	6.1231
5.9711	0.1146	1600	5.9283
5.8076	0.1289	1800	5.7896
5.718	0.1432	2000	5.6944
5.6422	0.1575	2200	5.6219
5.5956	0.1718	2400	5.5653
5.5424	0.1862	2600	5.5163
5.4527	0.2005	2800	5.4252
4.7472	0.2148	3000	4.6523
4.5528	0.2291	3200	4.4846
4.503	0.2434	3400	4.3817
4.427	0.2578	3600	4.3165
4.4322	0.2721	3800	4.2725
4.3265	0.2864	4000	4.2409
4.3255	0.3007	4200	4.2157
4.322	0.3150	4400	4.1930
4.1982	0.3294	4600	4.1759
4.2197	0.3437	4800	4.1609
4.2109	0.3580	5000	4.1478
4.1553	0.3723	5200	4.1329
4.169	0.3866	5400	4.1215
4.2068	0.4010	5600	4.1093
4.182	0.4153	5800	4.0969
4.2148	0.4296	6000	4.0841
4.0511	0.4439	6200	4.0716
4.0997	0.4582	6400	4.0592
4.0322	0.4726	6600	4.0488
3.9972	0.4869	6800	4.0372
4.0335	0.5012	7000	4.0258
4.0742	0.5155	7200	4.0168
4.003	0.5298	7400	4.0082
4.0007	0.5442	7600	3.9992
4.1114	0.5585	7800	3.9898
3.8742	0.5728	8000	3.9831
4.0346	0.5871	8200	3.9765
3.8871	0.6014	8400	3.9686
3.9689	0.6158	8600	3.9626
4.0003	0.6301	8800	3.9580
4.0529	0.6444	9000	3.9496
3.9973	0.6587	9200	3.9456
4.0418	0.6730	9400	3.9409
4.0237	0.6874	9600	3.9355
3.9256	0.7017	9800	3.9299
3.8549	0.7160	10000	3.9249
3.9872	0.7303	10200	3.9215
3.9918	0.7446	10400	3.9180
4.0075	0.7590	10600	3.9137
3.9235	0.7733	10800	3.9107
3.9416	0.7876	11000	3.9069
3.9939	0.8019	11200	3.9053
4.0625	0.8162	11400	3.9030
3.9773	0.8306	11600	3.9010
3.8279	0.8449	11800	3.8990
3.8631	0.8592	12000	3.8970
3.8593	0.8735	12200	3.8953
3.9531	0.8878	12400	3.8938
3.8922	0.9022	12600	3.8927
3.9151	0.9165	12800	3.8917
3.9119	0.9308	13000	3.8910
3.9261	0.9451	13200	3.8905
3.9169	0.9594	13400	3.8903
3.8439	0.9738	13600	3.8900
3.8795	0.9881	13800	3.8900

Framework versions

Transformers 4.56.1
Pytorch 2.8.0a0+5228986c39.nv25.05
Datasets 4.0.0
Tokenizers 0.22.0

Downloads last month: 4

Safetensors

Model size

1.9B params

Tensor type

F32

Evaluation results

Metadata error: specify a dataset to view leaderboard