---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_replace_iter5_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_replace_iter5_sftsd1

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.4466
- Num Input Tokens Seen: 8280920

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.6329        | 0.0318 | 5    | 1.3100          | 261936            |
| 1.0812        | 0.0635 | 10   | 1.2393          | 527984            |
| 0.8509        | 0.0953 | 15   | 1.2939          | 798792            |
| 0.5856        | 0.1271 | 20   | 1.4435          | 1068224           |
| 0.3945        | 0.1589 | 25   | 1.5981          | 1333664           |
| 0.2591        | 0.1906 | 30   | 1.7370          | 1600920           |
| 0.2297        | 0.2224 | 35   | 1.9540          | 1862864           |
| 0.1491        | 0.2542 | 40   | 2.0318          | 2119104           |
| 0.0693        | 0.2859 | 45   | 2.2388          | 2377720           |
| 0.0509        | 0.3177 | 50   | 2.3196          | 2637816           |
| 0.0475        | 0.3495 | 55   | 2.3864          | 2900952           |
| 0.034         | 0.3813 | 60   | 2.4376          | 3166456           |
| 0.0324        | 0.4130 | 65   | 2.4449          | 3436144           |
| 0.034         | 0.4448 | 70   | 2.4523          | 3702280           |
| 0.0326        | 0.4766 | 75   | 2.4438          | 3966328           |
| 0.0336        | 0.5083 | 80   | 2.4354          | 4221440           |
| 0.0313        | 0.5401 | 85   | 2.4139          | 4486432           |
| 0.0283        | 0.5719 | 90   | 2.3846          | 4751320           |
| 0.0301        | 0.6037 | 95   | 2.3932          | 5019592           |
| 0.0284        | 0.6354 | 100  | 2.4044          | 5280712           |
| 0.0256        | 0.6672 | 105  | 2.4084          | 5539944           |
| 0.0329        | 0.6990 | 110  | 2.4300          | 5807632           |
| 0.0266        | 0.7307 | 115  | 2.4236          | 6068760           |
| 0.0267        | 0.7625 | 120  | 2.4100          | 6331712           |
| 0.0268        | 0.7943 | 125  | 2.4094          | 6593680           |
| 0.0272        | 0.8261 | 130  | 2.4229          | 6859744           |
| 0.0296        | 0.8578 | 135  | 2.4294          | 7118040           |
| 0.027         | 0.8896 | 140  | 2.4374          | 7383424           |
| 0.0264        | 0.9214 | 145  | 2.4434          | 7650680           |
| 0.0248        | 0.9531 | 150  | 2.4362          | 7915376           |
| 0.0264        | 0.9849 | 155  | 2.4400          | 8174680           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1