TBD-LLaMA-2B-Final-Direction-2B

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.8900

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 139
  • training_steps: 13966

Training results

Training Loss Epoch Step Validation Loss
8.9472 0.0143 200 8.9381
6.7664 0.0286 400 6.7485
6.6429 0.0430 600 6.6299
6.5725 0.0573 800 6.5598
6.4746 0.0716 1000 6.4666
6.345 0.0859 1200 6.3290
6.1452 0.1002 1400 6.1231
5.9711 0.1146 1600 5.9283
5.8076 0.1289 1800 5.7896
5.718 0.1432 2000 5.6944
5.6422 0.1575 2200 5.6219
5.5956 0.1718 2400 5.5653
5.5424 0.1862 2600 5.5163
5.4527 0.2005 2800 5.4252
4.7472 0.2148 3000 4.6523
4.5528 0.2291 3200 4.4846
4.503 0.2434 3400 4.3817
4.427 0.2578 3600 4.3165
4.4322 0.2721 3800 4.2725
4.3265 0.2864 4000 4.2409
4.3255 0.3007 4200 4.2157
4.322 0.3150 4400 4.1930
4.1982 0.3294 4600 4.1759
4.2197 0.3437 4800 4.1609
4.2109 0.3580 5000 4.1478
4.1553 0.3723 5200 4.1329
4.169 0.3866 5400 4.1215
4.2068 0.4010 5600 4.1093
4.182 0.4153 5800 4.0969
4.2148 0.4296 6000 4.0841
4.0511 0.4439 6200 4.0716
4.0997 0.4582 6400 4.0592
4.0322 0.4726 6600 4.0488
3.9972 0.4869 6800 4.0372
4.0335 0.5012 7000 4.0258
4.0742 0.5155 7200 4.0168
4.003 0.5298 7400 4.0082
4.0007 0.5442 7600 3.9992
4.1114 0.5585 7800 3.9898
3.8742 0.5728 8000 3.9831
4.0346 0.5871 8200 3.9765
3.8871 0.6014 8400 3.9686
3.9689 0.6158 8600 3.9626
4.0003 0.6301 8800 3.9580
4.0529 0.6444 9000 3.9496
3.9973 0.6587 9200 3.9456
4.0418 0.6730 9400 3.9409
4.0237 0.6874 9600 3.9355
3.9256 0.7017 9800 3.9299
3.8549 0.7160 10000 3.9249
3.9872 0.7303 10200 3.9215
3.9918 0.7446 10400 3.9180
4.0075 0.7590 10600 3.9137
3.9235 0.7733 10800 3.9107
3.9416 0.7876 11000 3.9069
3.9939 0.8019 11200 3.9053
4.0625 0.8162 11400 3.9030
3.9773 0.8306 11600 3.9010
3.8279 0.8449 11800 3.8990
3.8631 0.8592 12000 3.8970
3.8593 0.8735 12200 3.8953
3.9531 0.8878 12400 3.8938
3.8922 0.9022 12600 3.8927
3.9151 0.9165 12800 3.8917
3.9119 0.9308 13000 3.8910
3.9261 0.9451 13200 3.8905
3.9169 0.9594 13400 3.8903
3.8439 0.9738 13600 3.8900
3.8795 0.9881 13800 3.8900

Framework versions

  • Transformers 4.56.1
  • Pytorch 2.8.0a0+5228986c39.nv25.05
  • Datasets 4.0.0
  • Tokenizers 0.22.0
Downloads last month
4
Safetensors
Model size
1.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support