Commit
·
3a57615
1
Parent(s):
df3f221
Update README.md
Browse files
README.md
CHANGED
|
@@ -31,6 +31,8 @@ colab_code_generator_FT_code_gen_UT, an instruction-following large language mod
|
|
| 31 |
|
| 32 |
# Getting Started
|
| 33 |
|
|
|
|
|
|
|
| 34 |
Loading the fine-tuned Code Generator
|
| 35 |
```
|
| 36 |
from peft import AutoPeftModelForCausalLM>
|
|
@@ -38,6 +40,50 @@ test_model_UT = AutoPeftModelForCausalLM.from_pretrained("01GangaPutraBheeshma/c
|
|
| 38 |
test_tokenizer_UT = AutoTokenizer.from_pretrained("01GangaPutraBheeshma/colab_code_generator_FT_code_gen_UT")
|
| 39 |
```
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
# Documentation
|
| 42 |
|
| 43 |
This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
|
|
@@ -75,3 +121,4 @@ bnb_config = BitsAndBytesConfig(
|
|
| 75 |
|
| 76 |
|
| 77 |
|
|
|
|
|
|
| 31 |
|
| 32 |
# Getting Started
|
| 33 |
|
| 34 |
+
|
| 35 |
+
## Installation
|
| 36 |
Loading the fine-tuned Code Generator
|
| 37 |
```
|
| 38 |
from peft import AutoPeftModelForCausalLM>
|
|
|
|
| 40 |
test_tokenizer_UT = AutoTokenizer.from_pretrained("01GangaPutraBheeshma/colab_code_generator_FT_code_gen_UT")
|
| 41 |
```
|
| 42 |
|
| 43 |
+
## Usage
|
| 44 |
+
For re-training this model, I would highly recommend using this format to provide input to the tokenizer.
|
| 45 |
+
|
| 46 |
+
```
|
| 47 |
+
def prompt_instruction_format(sample):
|
| 48 |
+
return f"""### Instruction:
|
| 49 |
+
Use the Task below and the Input given to write the Response, which is a programming code that can solve the following Task:
|
| 50 |
+
|
| 51 |
+
### Task:
|
| 52 |
+
{sample['instruction']}
|
| 53 |
+
|
| 54 |
+
### Input:
|
| 55 |
+
{sample['input']}
|
| 56 |
+
|
| 57 |
+
### Response:
|
| 58 |
+
{sample['output']}
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
Then, we can leverage the above function to format our input prompts that can be pre-processed and used in the Model Training using Supervised Fine-Tuning or SFTTrainer Class.
|
| 63 |
+
|
| 64 |
+
```
|
| 65 |
+
trainer = SFTTrainer(
|
| 66 |
+
model=model,
|
| 67 |
+
train_dataset=code_dataset,
|
| 68 |
+
peft_config=peft_config,
|
| 69 |
+
max_seq_length=2048,
|
| 70 |
+
tokenizer=tokenizer,
|
| 71 |
+
packing=True,
|
| 72 |
+
formatting_func=prompt_instruction_format,
|
| 73 |
+
args=trainingArgs,
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
This is a crucial step when we perform Reinforcement Learning with Human Feedback or RLHF for short. Here are the six reasons why its important:
|
| 79 |
+
1. Sample Efficiency
|
| 80 |
+
2. Task Adaptation
|
| 81 |
+
3. Transfer Learning
|
| 82 |
+
4. Human Guidance
|
| 83 |
+
5. Reducing Exploration Challenges
|
| 84 |
+
6. Addressing Distribution Shift
|
| 85 |
+
|
| 86 |
+
|
| 87 |
# Documentation
|
| 88 |
|
| 89 |
This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
|
|
|
|
| 121 |
|
| 122 |
|
| 123 |
|
| 124 |
+
|