Update README.md

9d8dbf8 verified 3 months ago

4.34 kB

	---
	base_model: meta-llama/Llama-3.3-70B-Instruct
	library_name: peft
	license: mit
	datasets:
	- tatsu-lab/alpaca
	language:
	- en
	- hi
	- ja
	- ta
	- te
	- mr
	tags:
	- llm
	- text-to-text
	- text-generation-inference
	- converstional
	- llama70b
	- lora
	- adapters
	---
	# 🧠 Model Card: `pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat`

	A LoRA fine-tuned version of the meta-llama/Llama-3.1-70B-Instruct model on the Alpaca dataset, optimized using PEFT and accelerated on Intel Gaudi3 HPU hardware.

	---

	## 📝 Model Summary

	This model is a fine-tuned variant of LLaMA 3.1 70B Instruct, trained on the Alpaca dataset using Parameter-Efficient Fine-Tuning (PEFT) via LoRA. The goal of this fine-tuning was to improve instruction-following performance on lightweight resources, leveraging Intel’s Gaudi3 HPU for efficient training.

	---

	## 📄 Model Details

	* Base Model: `meta-llama/Llama-3.1-70B-Instruct`
	* Fine-tuned Model: `pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat`
	* Fine-tuned By: Pranjal Singh Thakur
	* Dataset: Stanford Alpaca dataset
	* PEFT Library: PEFT v0.12.0
	* Fine-tuning Technique: LoRA
	* Epochs: 2
	* Training Hardware: 1 Node with 8× Intel Gaudi3 HPUs
	* Language(s): English
	* License: Same as base model (LLaMA 3)
	* Credit: Intel for providing Gaudi3 HPU infrastructure

	---

	## 🚀 Usage

	### Direct Use

	Use the model as an instruction-following chatbot or in downstream applications requiring LLM completion with lightweight deployment using LoRA adapters.

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-70B-Instruct")
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-70B-Instruct")

	model = PeftModel.from_pretrained(base_model, "pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat")

	inputs = tokenizer("### Instruction: Explain quantum computing in simple terms.", return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## 📊 Evaluation Results

	\| Metric \| Value \|
	\| ---------------------- \| --------- \|
	\| Eval Accuracy \| 73.27% \|
	\| Eval Loss \| 1.02 \|
	\| Perplexity \| 2.79 \|
	\| Evaluation Runtime \| 20.97s \|
	\| Samples Evaluated \| 101 \|
	\| Samples/Sec \| 4.82 \|
	\| Max Memory Used (GB) \| 126.2 \|
	\| Total Available Memory \| 126.54 GB \|
	\| Memory Allocated (GB) \| 41.06 \|

	---

	## 🛠 Training Configuration

	* Epochs: 2
	* Precision: Likely mixed precision (bf16/fp16 on Gaudi3)
	* Hardware: Intel Gaudi3 HPU (8 cards, 1 node)
	* Frameworks: PEFT, Hugging Face Transformers
	* Batching & Tokenization: Not explicitly provided

	---

	## 📦 Model Sources

	* Repository: [Hugging Face Model Card](https://huggingface.co/pranjalsingh/alpaca-Llama-3.1-70B-Instruct-chat)
	* Dataset: [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
	* Base Model: [`meta-llama/Llama-3.1-70B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)

	---

	## ⚠️ Limitations & Risks

	* Not suitable for multilingual tasks (trained only on English data).
	* May reflect biases present in the Alpaca dataset.
	* Not recommended for sensitive or safety-critical applications.
	* Fine-tuning was conducted for instruction tasks — may not generalize to other domains.

	---

	## ♻️ Environmental Impact

	\| Parameter \| Value \|
	\| ----------------- \| ----------------------------------------------------------- \|
	\| Compute Platform \| Intel Gaudi3 \|
	\| Cards Used \| 8× HPU \|
	\| Training Duration \| \~2 Epochs \|
	\| Region \| \[More info needed] \|
	\| Emission Estimate \| \[Use [MLCO2](https://mlco2.github.io/impact) to calculate] \|

	---

	## 👨‍💻 Author & Acknowledgment

	* Author: Pranjal Singh Thakur
	* Credit: Intel (for compute resources using Gaudi3 HPU)

	---

	## 🔖 Citation

	Coming soon.