File size: 3,583 Bytes

---
library_name: transformers
tags:
- Assistant
license: apache-2.0
datasets:
- teknium/OpenHermes-2.5
language:
- en
base_model:
- meta-llama/Llama-3.2-3B
---

# Llama-3.2-3B LoRA Fine-tune on OpenHermes

## 📖 Overview

This model is a **LoRA fine-tuned version of `meta-llama/Llama-3.2-3B`** on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5).
The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset.

Fine-tuning was performed with **parameter-efficient fine-tuning (PEFT)** using **LoRA adapters**. Only \~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains.

---

## ⚙️ Training Configuration

**Base Model:** `meta-llama/Llama-3.2-3B`
**Method:** QLoRA (LoRA rank 16, α=32, dropout=0.05)
**Trainable Parameters:** 24.3M / 3.24B (\~0.75%)

**Training Arguments:**

```python
training_args = TrainingArguments(
    output_dir="./llama_finetune_lora",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    num_train_epochs=1,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    weight_decay=0.01,
    logging_steps=200,

    evaluation_strategy="steps",
    eval_steps=200,
    save_strategy="steps",
    save_steps=1000,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,

    bf16=True,  # A100 support
    fp16=False,
    gradient_checkpointing=True,
    torch_compile=False,
    report_to="none",
    seed=42
)
```

---

## 📊 Training Metrics

Run stopped at **2000 steps** (\~4.5h on A100). Loss steadily improved and validation stabilized around 0.20.

| Step | Training Loss | Validation Loss |
| ---- | ------------- | --------------- |
| 200  | 1.2781        | 0.2202          |
| 400  | 0.2167        | 0.2134          |
| 600  | 0.2139        | 0.2098          |
| 800  | 0.2120        | 0.2072          |
| 1000 | 0.2085        | 0.2057          |
| 1200 | 0.1996        | 0.2043          |
| 1400 | 0.2056        | 0.2034          |
| 1600 | 0.2016        | 0.2023          |
| 1800 | 0.2000        | 0.2012          |
| 2000 | 0.2027        | 0.2005          |

📉 **Validation loss converged near \~0.20**, indicating effective adaptation.

---

## 🚀 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "meta-llama/Llama-3.2-3B"
adapter = "kunjcr2/llama3-3b-lora-openhermes"  # replace with your Hub repo

# Load base + adapter
tok = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Generate
prompt = "Explain the concept of binary search trees."
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tok.decode(outputs[0], skip_special_tokens=True))
```

---

## 📌 Notes

* Training was run with **bf16 + gradient checkpointing** on A100 (40GB).
* Only adapters are uploaded (small size). Use together with the base model.
* Repo includes: `adapter_model.safetensors`, `adapter_config.json`, tokenizer files, and this README.
* Training stopped early at **2000 steps (\~17% of planned)** due to good convergence.

---

✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures.