--- base_model: meta-llama/Llama-3.3-70B-Instruct library_name: peft license: mit datasets: - tatsu-lab/alpaca language: - en - hi - ja - ta - te - mr tags: - llm - text-to-text - text-generation-inference - converstional - llama70b - lora - adapters --- # 🧠 Model Card: `pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat` A LoRA fine-tuned version of the **meta-llama/Llama-3.1-70B-Instruct** model on the **Alpaca dataset**, optimized using **PEFT** and accelerated on **Intel Gaudi3 HPU** hardware. --- ## πŸ“ Model Summary This model is a fine-tuned variant of LLaMA 3.1 70B Instruct, trained on the Alpaca dataset using Parameter-Efficient Fine-Tuning (PEFT) via LoRA. The goal of this fine-tuning was to improve instruction-following performance on lightweight resources, leveraging Intel’s Gaudi3 HPU for efficient training. --- ## πŸ“„ Model Details * **Base Model:** `meta-llama/Llama-3.1-70B-Instruct` * **Fine-tuned Model:** `pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat` * **Fine-tuned By:** *Pranjal Singh Thakur* * **Dataset:** Stanford Alpaca dataset * **PEFT Library:** PEFT v0.12.0 * **Fine-tuning Technique:** LoRA * **Epochs:** 2 * **Training Hardware:** 1 Node with 8Γ— Intel Gaudi3 HPUs * **Language(s):** English * **License:** Same as base model (LLaMA 3) * **Credit:** Intel for providing Gaudi3 HPU infrastructure --- ## πŸš€ Usage ### Direct Use Use the model as an instruction-following chatbot or in downstream applications requiring LLM completion with lightweight deployment using LoRA adapters. ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-70B-Instruct") tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-70B-Instruct") model = PeftModel.from_pretrained(base_model, "pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat") inputs = tokenizer("### Instruction: Explain quantum computing in simple terms.", return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## πŸ“Š Evaluation Results | Metric | Value | | ---------------------- | --------- | | Eval Accuracy | 73.27% | | Eval Loss | 1.02 | | Perplexity | 2.79 | | Evaluation Runtime | 20.97s | | Samples Evaluated | 101 | | Samples/Sec | 4.82 | | Max Memory Used (GB) | 126.2 | | Total Available Memory | 126.54 GB | | Memory Allocated (GB) | 41.06 | --- ## πŸ›  Training Configuration * **Epochs:** 2 * **Precision:** Likely mixed precision (bf16/fp16 on Gaudi3) * **Hardware:** Intel Gaudi3 HPU (8 cards, 1 node) * **Frameworks:** PEFT, Hugging Face Transformers * **Batching & Tokenization:** Not explicitly provided --- ## πŸ“¦ Model Sources * **Repository:** [Hugging Face Model Card](https://huggingface.co/pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat) * **Dataset:** [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) * **Base Model:** [`meta-llama/Llama-3.1-70B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) --- ## ⚠️ Limitations & Risks * Not suitable for multilingual tasks (trained only on English data). * May reflect biases present in the Alpaca dataset. * Not recommended for sensitive or safety-critical applications. * Fine-tuning was conducted for instruction tasks β€” may not generalize to other domains. --- ## ♻️ Environmental Impact | Parameter | Value | | ----------------- | ----------------------------------------------------------- | | Compute Platform | Intel Gaudi3 | | Cards Used | 8Γ— HPU | | Training Duration | \~2 Epochs | | Region | \[More info needed] | | Emission Estimate | \[Use [MLCO2](https://mlco2.github.io/impact) to calculate] | --- ## πŸ‘¨β€πŸ’» Author & Acknowledgment * **Author:** Pranjal Singh Thakur * **Credit:** Intel (for compute resources using Gaudi3 HPU) --- ## πŸ”– Citation Coming soon.