|  | --- | 
					
						
						|  | license: gemma | 
					
						
						|  | library_name: peft | 
					
						
						|  | tags: | 
					
						
						|  | - alignment-handbook | 
					
						
						|  | - trl | 
					
						
						|  | - sft | 
					
						
						|  | - generated_from_trainer | 
					
						
						|  | base_model: google/gemma-2b | 
					
						
						|  | datasets: | 
					
						
						|  | - llama-duo/synth_summarize_dataset_dedup | 
					
						
						|  | model-index: | 
					
						
						|  | - name: gemma2b-summarize-gemini1_5flash-256k | 
					
						
						|  | results: [] | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | <!-- This model card has been generated automatically according to the information the Trainer had access to. You | 
					
						
						|  | should probably proofread and complete it, then remove this comment. --> | 
					
						
						|  |  | 
					
						
						|  | # gemma2b-summarize-gemini1_5flash-256k | 
					
						
						|  |  | 
					
						
						|  | This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the llama-duo/synth_summarize_dataset_dedup dataset. | 
					
						
						|  | It achieves the following results on the evaluation set: | 
					
						
						|  | - Loss: 2.5669 | 
					
						
						|  |  | 
					
						
						|  | ## Model description | 
					
						
						|  |  | 
					
						
						|  | More information needed | 
					
						
						|  |  | 
					
						
						|  | ## Intended uses & limitations | 
					
						
						|  |  | 
					
						
						|  | More information needed | 
					
						
						|  |  | 
					
						
						|  | ## Training and evaluation data | 
					
						
						|  |  | 
					
						
						|  | More information needed | 
					
						
						|  |  | 
					
						
						|  | ## Training procedure | 
					
						
						|  |  | 
					
						
						|  | ### Training hyperparameters | 
					
						
						|  |  | 
					
						
						|  | The following hyperparameters were used during training: | 
					
						
						|  | - learning_rate: 0.0002 | 
					
						
						|  | - train_batch_size: 8 | 
					
						
						|  | - eval_batch_size: 8 | 
					
						
						|  | - seed: 42 | 
					
						
						|  | - distributed_type: multi-GPU | 
					
						
						|  | - num_devices: 8 | 
					
						
						|  | - gradient_accumulation_steps: 2 | 
					
						
						|  | - total_train_batch_size: 128 | 
					
						
						|  | - total_eval_batch_size: 64 | 
					
						
						|  | - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | 
					
						
						|  | - lr_scheduler_type: cosine | 
					
						
						|  | - lr_scheduler_warmup_ratio: 0.1 | 
					
						
						|  | - num_epochs: 10 | 
					
						
						|  |  | 
					
						
						|  | ### Training results | 
					
						
						|  |  | 
					
						
						|  | | Training Loss | Epoch  | Step | Validation Loss | | 
					
						
						|  | |:-------------:|:------:|:----:|:---------------:| | 
					
						
						|  | | 1.0246        | 0.9976 | 207  | 2.4550          | | 
					
						
						|  | | 0.9556        | 2.0    | 415  | 2.4530          | | 
					
						
						|  | | 0.9114        | 2.9976 | 622  | 2.4641          | | 
					
						
						|  | | 0.8927        | 4.0    | 830  | 2.4882          | | 
					
						
						|  | | 0.8752        | 4.9976 | 1037 | 2.5081          | | 
					
						
						|  | | 0.8602        | 6.0    | 1245 | 2.5277          | | 
					
						
						|  | | 0.8464        | 6.9976 | 1452 | 2.5513          | | 
					
						
						|  | | 0.8353        | 8.0    | 1660 | 2.5615          | | 
					
						
						|  | | 0.8267        | 8.9976 | 1867 | 2.5674          | | 
					
						
						|  | | 0.827         | 9.9759 | 2070 | 2.5669          | | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ### Framework versions | 
					
						
						|  |  | 
					
						
						|  | - PEFT 0.11.1 | 
					
						
						|  | - Transformers 4.41.2 | 
					
						
						|  | - Pytorch 2.3.1+cu121 | 
					
						
						|  | - Datasets 2.19.2 | 
					
						
						|  | - Tokenizers 0.19.1 |