jkazdan
/

collapse_gemma-2-2b_hs2_replace_iter9_sftsd2

Generated from Trainer

Model card Files Files and versions Community

collapse_gemma-2-2b_hs2_replace_iter9_sftsd2 / README.md

jkazdan's picture

End of training

299dd36 verified about 1 year ago

|

history blame contribute delete

3.79 kB

	---
	license: gemma
	base_model: google/gemma-2-2b
	tags:
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: collapse_gemma-2-2b_hs2_replace_iter9_sftsd2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# collapse_gemma-2-2b_hs2_replace_iter9_sftsd2

	This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.6695
	- Num Input Tokens Seen: 7754872

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-06
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 2
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant_with_warmup
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Input Tokens Seen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-----------------:\|
	\| No log \| 0 \| 0 \| 1.3956 \| 0 \|
	\| 1.5959 \| 0.0315 \| 5 \| 1.3066 \| 254632 \|
	\| 1.0717 \| 0.0630 \| 10 \| 1.2465 \| 502128 \|
	\| 0.7125 \| 0.0945 \| 15 \| 1.3592 \| 744976 \|
	\| 0.504 \| 0.1260 \| 20 \| 1.5120 \| 986472 \|
	\| 0.2848 \| 0.1575 \| 25 \| 1.6652 \| 1237336 \|
	\| 0.2452 \| 0.1890 \| 30 \| 1.8288 \| 1482344 \|
	\| 0.1578 \| 0.2205 \| 35 \| 1.9980 \| 1732136 \|
	\| 0.0569 \| 0.2520 \| 40 \| 2.1960 \| 1978848 \|
	\| 0.0667 \| 0.2835 \| 45 \| 2.3046 \| 2223360 \|
	\| 0.0341 \| 0.3150 \| 50 \| 2.4331 \| 2460800 \|
	\| 0.0289 \| 0.3465 \| 55 \| 2.4497 \| 2702840 \|
	\| 0.027 \| 0.3780 \| 60 \| 2.5245 \| 2953304 \|
	\| 0.0265 \| 0.4094 \| 65 \| 2.5800 \| 3203880 \|
	\| 0.0271 \| 0.4409 \| 70 \| 2.5911 \| 3452328 \|
	\| 0.0265 \| 0.4724 \| 75 \| 2.6014 \| 3694936 \|
	\| 0.0237 \| 0.5039 \| 80 \| 2.6018 \| 3940776 \|
	\| 0.0253 \| 0.5354 \| 85 \| 2.5984 \| 4186160 \|
	\| 0.0254 \| 0.5669 \| 90 \| 2.6081 \| 4427280 \|
	\| 0.026 \| 0.5984 \| 95 \| 2.6275 \| 4674224 \|
	\| 0.0249 \| 0.6299 \| 100 \| 2.6499 \| 4922464 \|
	\| 0.0263 \| 0.6614 \| 105 \| 2.6559 \| 5169512 \|
	\| 0.0295 \| 0.6929 \| 110 \| 2.6640 \| 5411768 \|
	\| 0.0241 \| 0.7244 \| 115 \| 2.6679 \| 5655504 \|
	\| 0.0259 \| 0.7559 \| 120 \| 2.6763 \| 5901264 \|
	\| 0.0255 \| 0.7874 \| 125 \| 2.6777 \| 6144528 \|
	\| 0.024 \| 0.8189 \| 130 \| 2.6766 \| 6387936 \|
	\| 0.0228 \| 0.8504 \| 135 \| 2.6707 \| 6633736 \|
	\| 0.0258 \| 0.8819 \| 140 \| 2.6821 \| 6868528 \|
	\| 0.024 \| 0.9134 \| 145 \| 2.6846 \| 7115712 \|
	\| 0.0257 \| 0.9449 \| 150 \| 2.6769 \| 7363728 \|
	\| 0.0263 \| 0.9764 \| 155 \| 2.6716 \| 7603744 \|


	### Framework versions

	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1