cadgpt-gpt2-train

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 2
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.2987	0.0544	200	0.1816
0.1236	0.1088	400	0.0997
0.0876	0.1633	600	0.0692
0.0703	0.2177	800	0.0554
0.0601	0.2721	1000	0.0531
0.0553	0.3265	1200	0.0453
0.0514	0.3810	1400	0.0427
0.0473	0.4354	1600	0.0403
0.0463	0.4898	1800	0.0404
0.0461	0.5442	2000	0.0386
0.0428	0.5986	2200	0.0376
0.0428	0.6531	2400	0.0367
0.0395	0.7075	2600	0.0353
0.0396	0.7619	2800	0.0351
0.0388	0.8163	3000	0.0347
0.0383	0.8707	3200	0.0350
0.038	0.9252	3400	0.0359
0.0371	0.9796	3600	0.0343
0.0364	1.0340	3800	0.0328
0.0372	1.0884	4000	0.0331
0.0363	1.1429	4200	0.0324
0.0351	1.1973	4400	0.0334
0.0347	1.2517	4600	0.0317
0.0342	1.3061	4800	0.0315
0.0344	1.3605	5000	0.0314
0.0337	1.4150	5200	0.0309
0.0338	1.4694	5400	0.0310
0.0334	1.5238	5600	0.0308
0.0341	1.5782	5800	0.0306
0.033	1.6327	6000	0.0319
0.0335	1.6871	6200	0.0304
0.0322	1.7415	6400	0.0302
0.0329	1.7959	6600	0.0301
0.0336	1.8503	6800	0.0300
0.0328	1.9048	7000	0.0300
0.0331	1.9592	7200	0.0298

Safetensors

Model size

124M params

Tensor type

F32

Base model

Finetuned

(1843)

this model