kunjcr2
/

llama3-3b-lora-openhermes

Text Generation

text-generation-inference

Model card Files Files and versions

llama3-3b-lora-openhermes / README.md

kunjcr2's picture

Update README.md

2965a1f verified about 2 months ago

|

history blame contribute delete

3.58 kB

	---
	library_name: transformers
	tags:
	- Assistant
	license: apache-2.0
	datasets:
	- teknium/OpenHermes-2.5
	language:
	- en
	base_model:
	- meta-llama/Llama-3.2-3B
	---

	# Llama-3.2-3B LoRA Fine-tune on OpenHermes

	## 📖 Overview

	This model is a LoRA fine-tuned version of `meta-llama/Llama-3.2-3B` on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5).
	The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset.

	Fine-tuning was performed with parameter-efficient fine-tuning (PEFT) using LoRA adapters. Only \~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains.

	---

	## ⚙️ Training Configuration

	Base Model: `meta-llama/Llama-3.2-3B`
	Method: QLoRA (LoRA rank 16, α=32, dropout=0.05)
	Trainable Parameters: 24.3M / 3.24B (\~0.75%)

	Training Arguments:

	```python
	training_args = TrainingArguments(
	output_dir="./llama_finetune_lora",
	per_device_train_batch_size=2,
	gradient_accumulation_steps=8,
	learning_rate=2e-4,
	num_train_epochs=1,
	lr_scheduler_type="cosine",
	warmup_ratio=0.03,
	weight_decay=0.01,
	logging_steps=200,

	evaluation_strategy="steps",
	eval_steps=200,
	save_strategy="steps",
	save_steps=1000,
	save_total_limit=2,
	load_best_model_at_end=True,
	metric_for_best_model="eval_loss",
	greater_is_better=False,

	bf16=True, # A100 support
	fp16=False,
	gradient_checkpointing=True,
	torch_compile=False,
	report_to="none",
	seed=42
	)
	```

	---

	## 📊 Training Metrics

	Run stopped at 2000 steps (\~4.5h on A100). Loss steadily improved and validation stabilized around 0.20.

	\| Step \| Training Loss \| Validation Loss \|
	\| ---- \| ------------- \| --------------- \|
	\| 200 \| 1.2781 \| 0.2202 \|
	\| 400 \| 0.2167 \| 0.2134 \|
	\| 600 \| 0.2139 \| 0.2098 \|
	\| 800 \| 0.2120 \| 0.2072 \|
	\| 1000 \| 0.2085 \| 0.2057 \|
	\| 1200 \| 0.1996 \| 0.2043 \|
	\| 1400 \| 0.2056 \| 0.2034 \|
	\| 1600 \| 0.2016 \| 0.2023 \|
	\| 1800 \| 0.2000 \| 0.2012 \|
	\| 2000 \| 0.2027 \| 0.2005 \|

	📉 Validation loss converged near \~0.20, indicating effective adaptation.

	---

	## 🚀 Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base = "meta-llama/Llama-3.2-3B"
	adapter = "kunjcr2/llama3-3b-lora-openhermes" # replace with your Hub repo

	# Load base + adapter
	tok = AutoTokenizer.from_pretrained(adapter)
	model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
	model = PeftModel.from_pretrained(model, adapter)

	# Generate
	prompt = "Explain the concept of binary search trees."
	inputs = tok(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tok.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## 📌 Notes

	* Training was run with bf16 + gradient checkpointing on A100 (40GB).
	* Only adapters are uploaded (small size). Use together with the base model.
	* Repo includes: `adapter_model.safetensors`, `adapter_config.json`, tokenizer files, and this README.
	* Training stopped early at 2000 steps (\~17% of planned) due to good convergence.

	---

	✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures.