File size: 3,583 Bytes
0098f94
 
2965a1f
 
 
 
 
 
 
 
 
0098f94
 
2965a1f
0098f94
2965a1f
0098f94
2965a1f
 
0098f94
2965a1f
0098f94
2965a1f
0098f94
2965a1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0098f94
2965a1f
0098f94
2965a1f
0098f94
2965a1f
0098f94
2965a1f
 
 
 
 
 
 
 
 
 
 
 
0098f94
2965a1f
0098f94
2965a1f
0098f94
2965a1f
0098f94
2965a1f
 
 
0098f94
2965a1f
 
0098f94
2965a1f
 
 
 
0098f94
2965a1f
 
 
 
 
 
0098f94
2965a1f
0098f94
2965a1f
0098f94
2965a1f
 
 
 
0098f94
2965a1f
0098f94
2965a1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
library_name: transformers
tags:
- Assistant
license: apache-2.0
datasets:
- teknium/OpenHermes-2.5
language:
- en
base_model:
- meta-llama/Llama-3.2-3B
---

# Llama-3.2-3B LoRA Fine-tune on OpenHermes

## 📖 Overview

This model is a **LoRA fine-tuned version of `meta-llama/Llama-3.2-3B`** on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5).
The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset.

Fine-tuning was performed with **parameter-efficient fine-tuning (PEFT)** using **LoRA adapters**. Only \~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains.

---

## ⚙️ Training Configuration

**Base Model:** `meta-llama/Llama-3.2-3B`
**Method:** QLoRA (LoRA rank 16, α=32, dropout=0.05)
**Trainable Parameters:** 24.3M / 3.24B (\~0.75%)

**Training Arguments:**

```python
training_args = TrainingArguments(
    output_dir="./llama_finetune_lora",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    num_train_epochs=1,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    weight_decay=0.01,
    logging_steps=200,

    evaluation_strategy="steps",
    eval_steps=200,
    save_strategy="steps",
    save_steps=1000,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,

    bf16=True,  # A100 support
    fp16=False,
    gradient_checkpointing=True,
    torch_compile=False,
    report_to="none",
    seed=42
)
```

---

## 📊 Training Metrics

Run stopped at **2000 steps** (\~4.5h on A100). Loss steadily improved and validation stabilized around 0.20.

| Step | Training Loss | Validation Loss |
| ---- | ------------- | --------------- |
| 200  | 1.2781        | 0.2202          |
| 400  | 0.2167        | 0.2134          |
| 600  | 0.2139        | 0.2098          |
| 800  | 0.2120        | 0.2072          |
| 1000 | 0.2085        | 0.2057          |
| 1200 | 0.1996        | 0.2043          |
| 1400 | 0.2056        | 0.2034          |
| 1600 | 0.2016        | 0.2023          |
| 1800 | 0.2000        | 0.2012          |
| 2000 | 0.2027        | 0.2005          |

📉 **Validation loss converged near \~0.20**, indicating effective adaptation.

---

## 🚀 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "meta-llama/Llama-3.2-3B"
adapter = "kunjcr2/llama3-3b-lora-openhermes"  # replace with your Hub repo

# Load base + adapter
tok = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Generate
prompt = "Explain the concept of binary search trees."
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tok.decode(outputs[0], skip_special_tokens=True))
```

---

## 📌 Notes

* Training was run with **bf16 + gradient checkpointing** on A100 (40GB).
* Only adapters are uploaded (small size). Use together with the base model.
* Repo includes: `adapter_model.safetensors`, `adapter_config.json`, tokenizer files, and this README.
* Training stopped early at **2000 steps (\~17% of planned)** due to good convergence.

---

✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures.