--- library_name: transformers tags: - Assistant license: apache-2.0 datasets: - teknium/OpenHermes-2.5 language: - en base_model: - meta-llama/Llama-3.2-3B --- # Llama-3.2-3B LoRA Fine-tune on OpenHermes ## 📖 Overview This model is a **LoRA fine-tuned version of `meta-llama/Llama-3.2-3B`** on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5). The goal of this run was to adapt Llama-3.2-3B for improved instruction-following using a high-quality, multi-domain SFT dataset. Fine-tuning was performed with **parameter-efficient fine-tuning (PEFT)** using **LoRA adapters**. Only \~0.75% of model parameters were trained, keeping compute and memory usage efficient while still yielding strong gains. --- ## ⚙️ Training Configuration **Base Model:** `meta-llama/Llama-3.2-3B` **Method:** QLoRA (LoRA rank 16, α=32, dropout=0.05) **Trainable Parameters:** 24.3M / 3.24B (\~0.75%) **Training Arguments:** ```python training_args = TrainingArguments( output_dir="./llama_finetune_lora", per_device_train_batch_size=2, gradient_accumulation_steps=8, learning_rate=2e-4, num_train_epochs=1, lr_scheduler_type="cosine", warmup_ratio=0.03, weight_decay=0.01, logging_steps=200, evaluation_strategy="steps", eval_steps=200, save_strategy="steps", save_steps=1000, save_total_limit=2, load_best_model_at_end=True, metric_for_best_model="eval_loss", greater_is_better=False, bf16=True, # A100 support fp16=False, gradient_checkpointing=True, torch_compile=False, report_to="none", seed=42 ) ``` --- ## 📊 Training Metrics Run stopped at **2000 steps** (\~4.5h on A100). Loss steadily improved and validation stabilized around 0.20. | Step | Training Loss | Validation Loss | | ---- | ------------- | --------------- | | 200 | 1.2781 | 0.2202 | | 400 | 0.2167 | 0.2134 | | 600 | 0.2139 | 0.2098 | | 800 | 0.2120 | 0.2072 | | 1000 | 0.2085 | 0.2057 | | 1200 | 0.1996 | 0.2043 | | 1400 | 0.2056 | 0.2034 | | 1600 | 0.2016 | 0.2023 | | 1800 | 0.2000 | 0.2012 | | 2000 | 0.2027 | 0.2005 | 📉 **Validation loss converged near \~0.20**, indicating effective adaptation. --- ## 🚀 Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base = "meta-llama/Llama-3.2-3B" adapter = "kunjcr2/llama3-3b-lora-openhermes" # replace with your Hub repo # Load base + adapter tok = AutoTokenizer.from_pretrained(adapter) model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto") model = PeftModel.from_pretrained(model, adapter) # Generate prompt = "Explain the concept of binary search trees." inputs = tok(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256) print(tok.decode(outputs[0], skip_special_tokens=True)) ``` --- ## 📌 Notes * Training was run with **bf16 + gradient checkpointing** on A100 (40GB). * Only adapters are uploaded (small size). Use together with the base model. * Repo includes: `adapter_model.safetensors`, `adapter_config.json`, tokenizer files, and this README. * Training stopped early at **2000 steps (\~17% of planned)** due to good convergence. --- ✨ If you like this model, feel free to try it out and extend training. Future runs could include more steps, preference tuning (DPO/ORPO), or domain-specific mixtures.