|
--- |
|
library_name: transformers |
|
tags: |
|
- Assistant |
|
license: apache-2.0 |
|
datasets: |
|
- teknium/OpenHermes-2.5 |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.2-3B |
|
--- |
|
|
|
# Llama-3.2-3B LoRA Fine-tune on OpenHermes |
|
|
|
## 📖 Overview |
|
|
|
This model is a **LoRA fine-tuned version of `meta-llama/Llama-3.2-3B`** on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5). |
|
The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset. |
|
|
|
Fine-tuning was performed with **parameter-efficient fine-tuning (PEFT)** using **LoRA adapters**. Only \~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains. |
|
|
|
--- |
|
|
|
## ⚙️ Training Configuration |
|
|
|
**Base Model:** `meta-llama/Llama-3.2-3B` |
|
**Method:** QLoRA (LoRA rank 16, α=32, dropout=0.05) |
|
**Trainable Parameters:** 24.3M / 3.24B (\~0.75%) |
|
|
|
**Training Arguments:** |
|
|
|
```python |
|
training_args = TrainingArguments( |
|
output_dir="./llama_finetune_lora", |
|
per_device_train_batch_size=2, |
|
gradient_accumulation_steps=8, |
|
learning_rate=2e-4, |
|
num_train_epochs=1, |
|
lr_scheduler_type="cosine", |
|
warmup_ratio=0.03, |
|
weight_decay=0.01, |
|
logging_steps=200, |
|
|
|
evaluation_strategy="steps", |
|
eval_steps=200, |
|
save_strategy="steps", |
|
save_steps=1000, |
|
save_total_limit=2, |
|
load_best_model_at_end=True, |
|
metric_for_best_model="eval_loss", |
|
greater_is_better=False, |
|
|
|
bf16=True, # A100 support |
|
fp16=False, |
|
gradient_checkpointing=True, |
|
torch_compile=False, |
|
report_to="none", |
|
seed=42 |
|
) |
|
``` |
|
|
|
--- |
|
|
|
## 📊 Training Metrics |
|
|
|
Run stopped at **2000 steps** (\~4.5h on A100). Loss steadily improved and validation stabilized around 0.20. |
|
|
|
| Step | Training Loss | Validation Loss | |
|
| ---- | ------------- | --------------- | |
|
| 200 | 1.2781 | 0.2202 | |
|
| 400 | 0.2167 | 0.2134 | |
|
| 600 | 0.2139 | 0.2098 | |
|
| 800 | 0.2120 | 0.2072 | |
|
| 1000 | 0.2085 | 0.2057 | |
|
| 1200 | 0.1996 | 0.2043 | |
|
| 1400 | 0.2056 | 0.2034 | |
|
| 1600 | 0.2016 | 0.2023 | |
|
| 1800 | 0.2000 | 0.2012 | |
|
| 2000 | 0.2027 | 0.2005 | |
|
|
|
📉 **Validation loss converged near \~0.20**, indicating effective adaptation. |
|
|
|
--- |
|
|
|
## 🚀 Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
|
|
base = "meta-llama/Llama-3.2-3B" |
|
adapter = "kunjcr2/llama3-3b-lora-openhermes" # replace with your Hub repo |
|
|
|
# Load base + adapter |
|
tok = AutoTokenizer.from_pretrained(adapter) |
|
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto") |
|
model = PeftModel.from_pretrained(model, adapter) |
|
|
|
# Generate |
|
prompt = "Explain the concept of binary search trees." |
|
inputs = tok(prompt, return_tensors="pt").to(model.device) |
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
print(tok.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |
|
|
|
## 📌 Notes |
|
|
|
* Training was run with **bf16 + gradient checkpointing** on A100 (40GB). |
|
* Only adapters are uploaded (small size). Use together with the base model. |
|
* Repo includes: `adapter_model.safetensors`, `adapter_config.json`, tokenizer files, and this README. |
|
* Training stopped early at **2000 steps (\~17% of planned)** due to good convergence. |
|
|
|
--- |
|
|
|
✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures. |