File size: 3,583 Bytes
0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f 0098f94 2965a1f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
library_name: transformers
tags:
- Assistant
license: apache-2.0
datasets:
- teknium/OpenHermes-2.5
language:
- en
base_model:
- meta-llama/Llama-3.2-3B
---
# Llama-3.2-3B LoRA Fine-tune on OpenHermes
## 📖 Overview
This model is a **LoRA fine-tuned version of `meta-llama/Llama-3.2-3B`** on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5).
The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset.
Fine-tuning was performed with **parameter-efficient fine-tuning (PEFT)** using **LoRA adapters**. Only \~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains.
---
## ⚙️ Training Configuration
**Base Model:** `meta-llama/Llama-3.2-3B`
**Method:** QLoRA (LoRA rank 16, α=32, dropout=0.05)
**Trainable Parameters:** 24.3M / 3.24B (\~0.75%)
**Training Arguments:**
```python
training_args = TrainingArguments(
output_dir="./llama_finetune_lora",
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
learning_rate=2e-4,
num_train_epochs=1,
lr_scheduler_type="cosine",
warmup_ratio=0.03,
weight_decay=0.01,
logging_steps=200,
evaluation_strategy="steps",
eval_steps=200,
save_strategy="steps",
save_steps=1000,
save_total_limit=2,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False,
bf16=True, # A100 support
fp16=False,
gradient_checkpointing=True,
torch_compile=False,
report_to="none",
seed=42
)
```
---
## 📊 Training Metrics
Run stopped at **2000 steps** (\~4.5h on A100). Loss steadily improved and validation stabilized around 0.20.
| Step | Training Loss | Validation Loss |
| ---- | ------------- | --------------- |
| 200 | 1.2781 | 0.2202 |
| 400 | 0.2167 | 0.2134 |
| 600 | 0.2139 | 0.2098 |
| 800 | 0.2120 | 0.2072 |
| 1000 | 0.2085 | 0.2057 |
| 1200 | 0.1996 | 0.2043 |
| 1400 | 0.2056 | 0.2034 |
| 1600 | 0.2016 | 0.2023 |
| 1800 | 0.2000 | 0.2012 |
| 2000 | 0.2027 | 0.2005 |
📉 **Validation loss converged near \~0.20**, indicating effective adaptation.
---
## 🚀 Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "meta-llama/Llama-3.2-3B"
adapter = "kunjcr2/llama3-3b-lora-openhermes" # replace with your Hub repo
# Load base + adapter
tok = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
# Generate
prompt = "Explain the concept of binary search trees."
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tok.decode(outputs[0], skip_special_tokens=True))
```
---
## 📌 Notes
* Training was run with **bf16 + gradient checkpointing** on A100 (40GB).
* Only adapters are uploaded (small size). Use together with the base model.
* Repo includes: `adapter_model.safetensors`, `adapter_config.json`, tokenizer files, and this README.
* Training stopped early at **2000 steps (\~17% of planned)** due to good convergence.
---
✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures. |