TBD-LLaMA-2B-Final-Direction-2B
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.8900
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 139
- training_steps: 13966
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
8.9472 | 0.0143 | 200 | 8.9381 |
6.7664 | 0.0286 | 400 | 6.7485 |
6.6429 | 0.0430 | 600 | 6.6299 |
6.5725 | 0.0573 | 800 | 6.5598 |
6.4746 | 0.0716 | 1000 | 6.4666 |
6.345 | 0.0859 | 1200 | 6.3290 |
6.1452 | 0.1002 | 1400 | 6.1231 |
5.9711 | 0.1146 | 1600 | 5.9283 |
5.8076 | 0.1289 | 1800 | 5.7896 |
5.718 | 0.1432 | 2000 | 5.6944 |
5.6422 | 0.1575 | 2200 | 5.6219 |
5.5956 | 0.1718 | 2400 | 5.5653 |
5.5424 | 0.1862 | 2600 | 5.5163 |
5.4527 | 0.2005 | 2800 | 5.4252 |
4.7472 | 0.2148 | 3000 | 4.6523 |
4.5528 | 0.2291 | 3200 | 4.4846 |
4.503 | 0.2434 | 3400 | 4.3817 |
4.427 | 0.2578 | 3600 | 4.3165 |
4.4322 | 0.2721 | 3800 | 4.2725 |
4.3265 | 0.2864 | 4000 | 4.2409 |
4.3255 | 0.3007 | 4200 | 4.2157 |
4.322 | 0.3150 | 4400 | 4.1930 |
4.1982 | 0.3294 | 4600 | 4.1759 |
4.2197 | 0.3437 | 4800 | 4.1609 |
4.2109 | 0.3580 | 5000 | 4.1478 |
4.1553 | 0.3723 | 5200 | 4.1329 |
4.169 | 0.3866 | 5400 | 4.1215 |
4.2068 | 0.4010 | 5600 | 4.1093 |
4.182 | 0.4153 | 5800 | 4.0969 |
4.2148 | 0.4296 | 6000 | 4.0841 |
4.0511 | 0.4439 | 6200 | 4.0716 |
4.0997 | 0.4582 | 6400 | 4.0592 |
4.0322 | 0.4726 | 6600 | 4.0488 |
3.9972 | 0.4869 | 6800 | 4.0372 |
4.0335 | 0.5012 | 7000 | 4.0258 |
4.0742 | 0.5155 | 7200 | 4.0168 |
4.003 | 0.5298 | 7400 | 4.0082 |
4.0007 | 0.5442 | 7600 | 3.9992 |
4.1114 | 0.5585 | 7800 | 3.9898 |
3.8742 | 0.5728 | 8000 | 3.9831 |
4.0346 | 0.5871 | 8200 | 3.9765 |
3.8871 | 0.6014 | 8400 | 3.9686 |
3.9689 | 0.6158 | 8600 | 3.9626 |
4.0003 | 0.6301 | 8800 | 3.9580 |
4.0529 | 0.6444 | 9000 | 3.9496 |
3.9973 | 0.6587 | 9200 | 3.9456 |
4.0418 | 0.6730 | 9400 | 3.9409 |
4.0237 | 0.6874 | 9600 | 3.9355 |
3.9256 | 0.7017 | 9800 | 3.9299 |
3.8549 | 0.7160 | 10000 | 3.9249 |
3.9872 | 0.7303 | 10200 | 3.9215 |
3.9918 | 0.7446 | 10400 | 3.9180 |
4.0075 | 0.7590 | 10600 | 3.9137 |
3.9235 | 0.7733 | 10800 | 3.9107 |
3.9416 | 0.7876 | 11000 | 3.9069 |
3.9939 | 0.8019 | 11200 | 3.9053 |
4.0625 | 0.8162 | 11400 | 3.9030 |
3.9773 | 0.8306 | 11600 | 3.9010 |
3.8279 | 0.8449 | 11800 | 3.8990 |
3.8631 | 0.8592 | 12000 | 3.8970 |
3.8593 | 0.8735 | 12200 | 3.8953 |
3.9531 | 0.8878 | 12400 | 3.8938 |
3.8922 | 0.9022 | 12600 | 3.8927 |
3.9151 | 0.9165 | 12800 | 3.8917 |
3.9119 | 0.9308 | 13000 | 3.8910 |
3.9261 | 0.9451 | 13200 | 3.8905 |
3.9169 | 0.9594 | 13400 | 3.8903 |
3.8439 | 0.9738 | 13600 | 3.8900 |
3.8795 | 0.9881 | 13800 | 3.8900 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0a0+5228986c39.nv25.05
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 4