Phi-3 Mini 4K Instruct - Alpaca LoRA Fine-tuned

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct using LoRA (Low-Rank Adaptation) on the Alpaca dataset.

Model Details

Base Model: microsoft/Phi-3-mini-4k-instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Dataset: tatsu-lab/alpaca (52,002 instruction-following examples)
Training Duration: ~1.24 hours
Final Training Loss: 1.0445
Average Training Loss: 1.0311

Training Configuration

LoRA Rank: 16
LoRA Alpha: 32
LoRA Dropout: 0.05
Target Modules: qkv_proj, o_proj, gate_proj, up_proj, down_proj
Learning Rate: 1e-5
Batch Size: 2 (with gradient accumulation steps: 8)
Epochs: 1
Precision: bfloat16
Gradient Checkpointing: Enabled

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "johnlam90/phi3-mini-4k-instruct-alpaca-lora")
model.eval()

# Format prompt
prompt = "Give three tips for staying healthy."
formatted_prompt = f'''### Instruction:
{prompt}

### Response:
'''

# Generate
inputs = tokenizer(formatted_prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=False,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[1].strip())

Performance

The model has been tested with comprehensive safety measures including:

✅ NaN clamp protection for stable generation
✅ Proper bfloat16 precision handling
✅ Consistent and coherent responses across multiple test prompts
✅ No numerical instabilities during training or inference

Training Details

This model was fine-tuned with careful attention to:

Data Formatting: Proper Alpaca instruction/input/output structure
Numerical Stability: Using bfloat16 precision and conservative hyperparameters
Memory Efficiency: Gradient checkpointing and optimized batch sizes
Safety Measures: NaN protection and proper token handling

License

This model is released under the MIT license, following the base model's licensing terms.

johnlam90
/

phi3-mini-4k-instruct-alpaca-lora