lbourdois commited on
Commit
54f14a5
·
verified ·
1 Parent(s): 4bcccd1

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +253 -241
README.md CHANGED
@@ -1,241 +1,253 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - unsloth
5
- - trl
6
- - grpo
7
- license: mit
8
- datasets:
9
- - eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
10
- language:
11
- - en
12
- base_model:
13
- - Qwen/Qwen2.5-1.5B-Instruct
14
- ---
15
-
16
- # Qwen2.5-1.5B-Instruct Fine-Tuned on GSM8K with DeepSeek Augmentation
17
-
18
- ## Model Overview
19
-
20
- This model is a fine-tuned version of **Qwen2.5-1.5B-Instruct**, designed for **mathematical problem-solving and structured reasoning**. It is trained on an **enhanced GSM8K dataset** incorporating **Chain-of-Thought (CoT) reasoning** augmented by **DeepSeek AI**.
21
-
22
- ### Key Features
23
- - **Base Model:** Qwen2.5-1.5B-Instruct
24
- - **Fine-Tuned On:** GSM8K enhanced with DeepSeek-V3
25
- - **Optimized for:** Logical problem-solving and math reasoning
26
- - **Fine-tuning method:** LoRA (Low-Rank Adaptation)
27
- - **Inference-ready:** Available on **Hugging Face** and compatible with `llama.cpp`
28
- - **Supports GGUF:** Optimized versions for **Q4_K_M, Q8_0, Q5_K_M, and FP16**
29
-
30
- ## Model Details
31
-
32
- - **Developed by:** [Yiqiao Yin](https://www.y-yin.io/)
33
- - **Model Type:** Causal Language Model (Text Generation)
34
- - **Languages:** English (`en`)
35
- - **License:** MIT License
36
- - **Fine-tuned from:** `Qwen/Qwen2.5-1.5B-Instruct`
37
- - **Training Library:** `transformers` + `unsloth` + `trl`
38
- - **Quantization:** GGUF (`Q4_K_M, Q8_0, Q5_K_M, f16`)
39
-
40
- 🔗 **Hugging Face Repository:**
41
- 👉 [Fine-tuned Qwen2.5-1.5B-Instruct](https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3)
42
-
43
- ## How to Use the Model
44
-
45
- ### Using `transformers` in Python
46
- You may need to install `bitsandbytes` by using
47
-
48
- ```bash
49
- ! pip install -U bitsandbytes
50
- ```
51
-
52
- Then you can use the following code to run inference.
53
- ```python
54
- from transformers import AutoModelForCausalLM, AutoTokenizer
55
- import torch
56
-
57
- # Load model and tokenizer
58
- model_name = "eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v2"
59
- tokenizer = AutoTokenizer.from_pretrained(model_name)
60
- model = AutoModelForCausalLM.from_pretrained(model_name)
61
-
62
- # Move model to GPU if available
63
- device = "cuda" if torch.cuda.is_available() else "cpu"
64
- model.to(device)
65
-
66
- # Example inference
67
- question = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"
68
- inputs = tokenizer(question, return_tensors="pt").to(device)
69
- output = model.generate(**inputs, max_length=200)
70
-
71
- # Decode response
72
- print(tokenizer.decode(output[0], skip_special_tokens=True))
73
- ```
74
-
75
- ## Running the Model with `llama.cpp`
76
-
77
- ### Step 1: Install `llama.cpp`
78
- ```sh
79
- brew install llama.cpp
80
- ```
81
-
82
- ### Step 2: Download the Model
83
- ```sh
84
- mkdir -p ~/llama_models && cd ~/llama_models
85
- wget https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3/resolve/main/q8_0.gguf
86
- ```
87
-
88
- ### Step 3: Run the Model
89
- ```sh
90
- llama-cli -m ~/llama_models/q8_0.gguf --interactive
91
- ```
92
-
93
- Or you can use the following:
94
-
95
- ```sh
96
- llama-cli -hf eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3:Q8_0
97
- ```
98
-
99
- ### Step 4: Test with a Prompt
100
- ```sh
101
- llama-cli -m ~/llama_models/q8_0.gguf -p "Explain quantum computing in simple terms."
102
- ```
103
-
104
- ## Training Details
105
-
106
- ### Custom Reward
107
-
108
- ```python
109
- def count_xml(text: str) -> float:
110
- """
111
- Calculates a reward based on the occurrence of certain XML tags and subtracts penalties for content after closing tags.
112
-
113
- Args:
114
- text (str): The text string to analyze for XML tag consistency.
115
-
116
- Returns:
117
- float: Total reward score based on XML tag occurrence and penalties.
118
- """
119
- count = 0.0
120
- if text.count("<think>\n") == 1:
121
- count += 0.125
122
- if text.count("\n</think>\n") == 1:
123
- count += 0.125
124
- if text.count("\n<answer>\n") == 1:
125
- count += 0.125
126
- count -= len(text.split("\n</answer>\n")[-1])*0.001
127
- if text.count("\n</answer>") == 1:
128
- count += 0.125
129
- count -= (len(text.split("\n</answer>")[-1]) - 1)*0.001
130
-
131
- # Ensure `<think>` and `</think>` exist
132
- if "<think>" in text and "</think>" in text:
133
- count += 1.0 # Higher weight to ensure reasoning consistency
134
- else:
135
- count -= 1.0 # Penalize if missing
136
-
137
- return count
138
- ```
139
-
140
- Each component contributes to the total reward **if conditions are met**:
141
-
142
- | Condition | Reward |
143
- |-----------|--------|
144
- | `"<think>\n"` appears exactly **once** | **+0.125** |
145
- | `"\n</think>\n"` appears exactly **once** | **+0.125** |
146
- | `"\n<answer>\n"` appears exactly **once** | **+0.125** |
147
- | `"\n</answer>"` appears exactly **once** | **+0.125** |
148
- | Both `<think>` and `</think>` exist anywhere | **+1.0** |
149
- | No extra text after `"</answer>"` | **No penalty** |
150
-
151
- Total possible reward **before penalties**:
152
- \[
153
- 0.125 + 0.125 + 0.125 + 0.125 + 1.0 = 1.5
154
- \]
155
-
156
- **Potential Penalties**
157
- The function applies penalties for **extra content after `"</answer>"`**:
158
- \[
159
- -\left( \text{length of extra text} \times 0.001 \right)
160
- \]
161
- If the **best case** occurs (i.e., **no extra content**), then:
162
- - **Penalty = 0**
163
- - **Final Reward = 1.5 (no deductions)**
164
-
165
- ---
166
-
167
- **Best Possible Input Example**
168
- This **ideal input** gives the highest possible reward:
169
-
170
- ```xml
171
- <think>
172
- Valid reasoning goes here.
173
- </think>
174
-
175
- <answer>
176
- Correct final answer here.
177
- </answer>
178
- ```
179
-
180
- This means we customize the reward function so that we encourage the answer to have reasoning inside. We also know mathematically what the reward should be so we can monitor it during training process.
181
-
182
- ### Dataset Used
183
- The model was fine-tuned on:
184
- 🔹 [`eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`](https://huggingface.co/datasets/eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1)
185
-
186
- This dataset contains:
187
- - **8K training samples**
188
- - **1K testing samples**
189
- - Features: `question`, `answer`, `cot` (Chain-of-Thought)
190
-
191
- ### Training Configuration
192
- - **Framework:** `transformers` + `unsloth` + `trl`
193
- - **Optimization:** LoRA applied to QKV projections
194
- - **Learning Rate:** `1e-6`
195
- - **AdamW Optimizer (8-bit)**
196
- - **Mixed Precision (`bf16` or `fp16`)**
197
- - **Batch Size:** `8`
198
- - **Max Sequence Length:** `1024`
199
-
200
- ## Model Performance
201
-
202
- ### Training Loss
203
- | Step | XML Count |
204
- |------|-----------|
205
- | 10 | -1 |
206
- | 100 | -1 |
207
- | 500 | -0.6421 |
208
- | 750 | 0.7611 |
209
- | 1000 | 1.0506 |
210
-
211
- As we can see, after 1000 steps, we see the reward of XML Count is above 1, which is getting good. This took about `1h 46min 50s` on a T4 GPU in Colab with High RAM.
212
-
213
- ## Bias, Risks, and Limitations
214
-
215
- ### Potential Risks
216
- - May **hallucinate** incorrect reasoning steps if prompts are unclear.
217
- - Could struggle with **complex mathematical problems** outside its training data.
218
- - **Limited generalization** to non-math reasoning tasks.
219
-
220
- ### Recommendations
221
- - If using this model for **critical applications**, verify outputs with human review.
222
- - For **better performance**, fine-tune on **larger datasets** with real-world numerical reasoning.
223
-
224
- ## Environmental Impact
225
-
226
- **Estimated Carbon Emissions:**
227
- - **Hardware Used:** NVIDIA A100 GPU
228
- - **Training Time:** ~5 hours
229
- - **Estimated CO2 Emitted:** ~8.2 kg CO2eq (via [ML Impact Calculator](https://mlco2.github.io/impact#compute))
230
-
231
- ## Citation
232
-
233
- Upcoming
234
-
235
- ## Contact
236
- For questions, suggestions, or issues, reach out via [Hugging Face Discussions](https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3/discussions).
237
-
238
- ---
239
-
240
- 🎉 **Thank you for using this model!** If you find it useful, please ⭐ it on **Hugging Face**! 🚀🔥
241
-
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - unsloth
5
+ - trl
6
+ - grpo
7
+ license: mit
8
+ datasets:
9
+ - eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ base_model:
25
+ - Qwen/Qwen2.5-1.5B-Instruct
26
+ ---
27
+
28
+ # Qwen2.5-1.5B-Instruct Fine-Tuned on GSM8K with DeepSeek Augmentation
29
+
30
+ ## Model Overview
31
+
32
+ This model is a fine-tuned version of **Qwen2.5-1.5B-Instruct**, designed for **mathematical problem-solving and structured reasoning**. It is trained on an **enhanced GSM8K dataset** incorporating **Chain-of-Thought (CoT) reasoning** augmented by **DeepSeek AI**.
33
+
34
+ ### Key Features
35
+ - **Base Model:** Qwen2.5-1.5B-Instruct
36
+ - **Fine-Tuned On:** GSM8K enhanced with DeepSeek-V3
37
+ - **Optimized for:** Logical problem-solving and math reasoning
38
+ - **Fine-tuning method:** LoRA (Low-Rank Adaptation)
39
+ - **Inference-ready:** Available on **Hugging Face** and compatible with `llama.cpp`
40
+ - **Supports GGUF:** Optimized versions for **Q4_K_M, Q8_0, Q5_K_M, and FP16**
41
+
42
+ ## Model Details
43
+
44
+ - **Developed by:** [Yiqiao Yin](https://www.y-yin.io/)
45
+ - **Model Type:** Causal Language Model (Text Generation)
46
+ - **Languages:** English (`en`)
47
+ - **License:** MIT License
48
+ - **Fine-tuned from:** `Qwen/Qwen2.5-1.5B-Instruct`
49
+ - **Training Library:** `transformers` + `unsloth` + `trl`
50
+ - **Quantization:** GGUF (`Q4_K_M, Q8_0, Q5_K_M, f16`)
51
+
52
+ 🔗 **Hugging Face Repository:**
53
+ 👉 [Fine-tuned Qwen2.5-1.5B-Instruct](https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3)
54
+
55
+ ## How to Use the Model
56
+
57
+ ### Using `transformers` in Python
58
+ You may need to install `bitsandbytes` by using
59
+
60
+ ```bash
61
+ ! pip install -U bitsandbytes
62
+ ```
63
+
64
+ Then you can use the following code to run inference.
65
+ ```python
66
+ from transformers import AutoModelForCausalLM, AutoTokenizer
67
+ import torch
68
+
69
+ # Load model and tokenizer
70
+ model_name = "eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v2"
71
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
72
+ model = AutoModelForCausalLM.from_pretrained(model_name)
73
+
74
+ # Move model to GPU if available
75
+ device = "cuda" if torch.cuda.is_available() else "cpu"
76
+ model.to(device)
77
+
78
+ # Example inference
79
+ question = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"
80
+ inputs = tokenizer(question, return_tensors="pt").to(device)
81
+ output = model.generate(**inputs, max_length=200)
82
+
83
+ # Decode response
84
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
85
+ ```
86
+
87
+ ## Running the Model with `llama.cpp`
88
+
89
+ ### Step 1: Install `llama.cpp`
90
+ ```sh
91
+ brew install llama.cpp
92
+ ```
93
+
94
+ ### Step 2: Download the Model
95
+ ```sh
96
+ mkdir -p ~/llama_models && cd ~/llama_models
97
+ wget https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3/resolve/main/q8_0.gguf
98
+ ```
99
+
100
+ ### Step 3: Run the Model
101
+ ```sh
102
+ llama-cli -m ~/llama_models/q8_0.gguf --interactive
103
+ ```
104
+
105
+ Or you can use the following:
106
+
107
+ ```sh
108
+ llama-cli -hf eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3:Q8_0
109
+ ```
110
+
111
+ ### Step 4: Test with a Prompt
112
+ ```sh
113
+ llama-cli -m ~/llama_models/q8_0.gguf -p "Explain quantum computing in simple terms."
114
+ ```
115
+
116
+ ## Training Details
117
+
118
+ ### Custom Reward
119
+
120
+ ```python
121
+ def count_xml(text: str) -> float:
122
+ """
123
+ Calculates a reward based on the occurrence of certain XML tags and subtracts penalties for content after closing tags.
124
+
125
+ Args:
126
+ text (str): The text string to analyze for XML tag consistency.
127
+
128
+ Returns:
129
+ float: Total reward score based on XML tag occurrence and penalties.
130
+ """
131
+ count = 0.0
132
+ if text.count("<think>\n") == 1:
133
+ count += 0.125
134
+ if text.count("\n</think>\n") == 1:
135
+ count += 0.125
136
+ if text.count("\n<answer>\n") == 1:
137
+ count += 0.125
138
+ count -= len(text.split("\n</answer>\n")[-1])*0.001
139
+ if text.count("\n</answer>") == 1:
140
+ count += 0.125
141
+ count -= (len(text.split("\n</answer>")[-1]) - 1)*0.001
142
+
143
+ # Ensure `<think>` and `</think>` exist
144
+ if "<think>" in text and "</think>" in text:
145
+ count += 1.0 # Higher weight to ensure reasoning consistency
146
+ else:
147
+ count -= 1.0 # Penalize if missing
148
+
149
+ return count
150
+ ```
151
+
152
+ Each component contributes to the total reward **if conditions are met**:
153
+
154
+ | Condition | Reward |
155
+ |-----------|--------|
156
+ | `"<think>\n"` appears exactly **once** | **+0.125** |
157
+ | `"\n</think>\n"` appears exactly **once** | **+0.125** |
158
+ | `"\n<answer>\n"` appears exactly **once** | **+0.125** |
159
+ | `"\n</answer>"` appears exactly **once** | **+0.125** |
160
+ | Both `<think>` and `</think>` exist anywhere | **+1.0** |
161
+ | No extra text after `"</answer>"` | **No penalty** |
162
+
163
+ Total possible reward **before penalties**:
164
+ \[
165
+ 0.125 + 0.125 + 0.125 + 0.125 + 1.0 = 1.5
166
+ \]
167
+
168
+ **Potential Penalties**
169
+ The function applies penalties for **extra content after `"</answer>"`**:
170
+ \[
171
+ -\left( \text{length of extra text} \times 0.001 \right)
172
+ \]
173
+ If the **best case** occurs (i.e., **no extra content**), then:
174
+ - **Penalty = 0**
175
+ - **Final Reward = 1.5 (no deductions)**
176
+
177
+ ---
178
+
179
+ **Best Possible Input Example**
180
+ This **ideal input** gives the highest possible reward:
181
+
182
+ ```xml
183
+ <think>
184
+ Valid reasoning goes here.
185
+ </think>
186
+
187
+ <answer>
188
+ Correct final answer here.
189
+ </answer>
190
+ ```
191
+
192
+ This means we customize the reward function so that we encourage the answer to have reasoning inside. We also know mathematically what the reward should be so we can monitor it during training process.
193
+
194
+ ### Dataset Used
195
+ The model was fine-tuned on:
196
+ 🔹 [`eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`](https://huggingface.co/datasets/eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1)
197
+
198
+ This dataset contains:
199
+ - **8K training samples**
200
+ - **1K testing samples**
201
+ - Features: `question`, `answer`, `cot` (Chain-of-Thought)
202
+
203
+ ### Training Configuration
204
+ - **Framework:** `transformers` + `unsloth` + `trl`
205
+ - **Optimization:** LoRA applied to QKV projections
206
+ - **Learning Rate:** `1e-6`
207
+ - **AdamW Optimizer (8-bit)**
208
+ - **Mixed Precision (`bf16` or `fp16`)**
209
+ - **Batch Size:** `8`
210
+ - **Max Sequence Length:** `1024`
211
+
212
+ ## Model Performance
213
+
214
+ ### Training Loss
215
+ | Step | XML Count |
216
+ |------|-----------|
217
+ | 10 | -1 |
218
+ | 100 | -1 |
219
+ | 500 | -0.6421 |
220
+ | 750 | 0.7611 |
221
+ | 1000 | 1.0506 |
222
+
223
+ As we can see, after 1000 steps, we see the reward of XML Count is above 1, which is getting good. This took about `1h 46min 50s` on a T4 GPU in Colab with High RAM.
224
+
225
+ ## Bias, Risks, and Limitations
226
+
227
+ ### Potential Risks
228
+ - May **hallucinate** incorrect reasoning steps if prompts are unclear.
229
+ - Could struggle with **complex mathematical problems** outside its training data.
230
+ - **Limited generalization** to non-math reasoning tasks.
231
+
232
+ ### Recommendations
233
+ - If using this model for **critical applications**, verify outputs with human review.
234
+ - For **better performance**, fine-tune on **larger datasets** with real-world numerical reasoning.
235
+
236
+ ## Environmental Impact
237
+
238
+ **Estimated Carbon Emissions:**
239
+ - **Hardware Used:** NVIDIA A100 GPU
240
+ - **Training Time:** ~5 hours
241
+ - **Estimated CO2 Emitted:** ~8.2 kg CO2eq (via [ML Impact Calculator](https://mlco2.github.io/impact#compute))
242
+
243
+ ## Citation
244
+
245
+ Upcoming
246
+
247
+ ## Contact
248
+ For questions, suggestions, or issues, reach out via [Hugging Face Discussions](https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-openai-gsm8k-data-enhanced-with-deepseek-v3/discussions).
249
+
250
+ ---
251
+
252
+ 🎉 **Thank you for using this model!** If you find it useful, please ⭐ it on **Hugging Face**! 🚀🔥
253
+