Update README.md
Browse files
README.md
CHANGED
@@ -6,15 +6,72 @@ tags:
|
|
6 |
- generated_from_trainer
|
7 |
datasets:
|
8 |
- trl-lib/tldr
|
9 |
-
model-index:
|
10 |
-
- name: mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
|
11 |
-
results: []
|
12 |
---
|
13 |
|
14 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
-
should probably proofread and complete it, then remove this comment. -->
|
16 |
-
|
17 |
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
<details><summary>See axolotl config</summary>
|
19 |
|
20 |
axolotl version: `0.10.0.dev0`
|
@@ -36,9 +93,6 @@ datasets:
|
|
36 |
no_input_format: "<|user|>\n{instruction}\n<|assistant|>\n"
|
37 |
split: train
|
38 |
|
39 |
-
dataset_prepared_path: /mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/last_run_prepared
|
40 |
-
output_dir: /mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
|
41 |
-
|
42 |
sequence_len: 4096
|
43 |
sample_packing: true
|
44 |
pad_to_sequence_len: true
|
@@ -109,30 +163,9 @@ llmcompressor:
|
|
109 |
start: 0
|
110 |
save_compressed: true
|
111 |
```
|
112 |
-
|
113 |
</details><br>
|
114 |
|
115 |
-
|
116 |
-
|
117 |
-
This model is a fine-tuned version of [RedHatAI/Sparse-Llama-3.1-8B-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-2of4) on the trl-lib/tldr dataset.
|
118 |
-
It achieves the following results on the evaluation set:
|
119 |
-
- Loss: 1.8064
|
120 |
-
|
121 |
-
## Model description
|
122 |
-
|
123 |
-
More information needed
|
124 |
-
|
125 |
-
## Intended uses & limitations
|
126 |
-
|
127 |
-
More information needed
|
128 |
-
|
129 |
-
## Training and evaluation data
|
130 |
-
|
131 |
-
More information needed
|
132 |
-
|
133 |
-
## Training procedure
|
134 |
-
|
135 |
-
### Training hyperparameters
|
136 |
|
137 |
The following hyperparameters were used during training:
|
138 |
- learning_rate: 2e-05
|
@@ -148,7 +181,9 @@ The following hyperparameters were used during training:
|
|
148 |
- lr_scheduler_warmup_steps: 32
|
149 |
- num_epochs: 2.0
|
150 |
|
151 |
-
|
|
|
|
|
152 |
|
153 |
| Training Loss | Epoch | Step | Validation Loss |
|
154 |
|:-------------:|:------:|:----:|:---------------:|
|
@@ -161,10 +196,85 @@ The following hyperparameters were used during training:
|
|
161 |
| 1.6955 | 1.5046 | 492 | 1.8065 |
|
162 |
| 1.762 | 1.7554 | 574 | 1.8064 |
|
163 |
|
|
|
164 |
|
165 |
-
|
166 |
|
167 |
- Transformers 4.51.3
|
168 |
- Pytorch 2.7.0+cu126
|
169 |
- Datasets 3.5.1
|
170 |
- Tokenizers 0.21.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- generated_from_trainer
|
7 |
datasets:
|
8 |
- trl-lib/tldr
|
|
|
|
|
|
|
9 |
---
|
10 |
|
|
|
|
|
|
|
11 |
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
12 |
+
|
13 |
+
# Sparse-Llama-3.1-8B-tldr-2of4
|
14 |
+
|
15 |
+
## Model Overview
|
16 |
+
- **Model Architecture:** LlamaForCausalLM
|
17 |
+
- **Input:** Text
|
18 |
+
- **Output:** Text
|
19 |
+
- **Model Optimizations:**
|
20 |
+
- **Sparsity:** 2:4
|
21 |
+
- **Release Date:** 05/29/2025
|
22 |
+
- **Version:** 1.0
|
23 |
+
- **Intended Use Cases:** This model is finetuned to summarize text in the style of Reddit posts.
|
24 |
+
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License.
|
25 |
+
- **Model Developers:** Red Hat (Neural Magic)
|
26 |
+
|
27 |
+
This model is a fine-tuned version of [RedHatAI/Sparse-Llama-3.1-8B-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-2of4) on the trl-lib/tldr dataset.
|
28 |
+
|
29 |
+
## Deployment
|
30 |
+
|
31 |
+
This model can be deployed efficiently using [vLLM](https://docs.vllm.ai/en/latest/), as shown in the example below.
|
32 |
+
|
33 |
+
Run the following command to start the vLLM server:
|
34 |
+
```bash
|
35 |
+
vllm serve nm-testing/Sparse-Llama-3.1-8B-tldr-2of4
|
36 |
+
```
|
37 |
+
|
38 |
+
Once your server is started, you can query the model using the OpenAI API:
|
39 |
+
|
40 |
+
```python
|
41 |
+
from openai import OpenAI
|
42 |
+
|
43 |
+
# Modify OpenAI's API key and API base to use vLLM's API server.
|
44 |
+
openai_api_key = "EMPTY"
|
45 |
+
openai_api_base = "http://localhost:8000/v1"
|
46 |
+
client = OpenAI(
|
47 |
+
api_key=openai_api_key,
|
48 |
+
base_url=openai_api_base,
|
49 |
+
)
|
50 |
+
|
51 |
+
post="""
|
52 |
+
SUBREDDIT: r/AI
|
53 |
+
|
54 |
+
TITLE: Training sparse LLMs
|
55 |
+
|
56 |
+
POST: Now you can use the llm-compressor integration to axolotl to train sparse LLMs!
|
57 |
+
|
58 |
+
It's super easy to use. See the example in https://huggingface.co/nm-testing/Sparse-Llama-3.1-8B-tldr-2of4.
|
59 |
+
|
60 |
+
And there's more. You can run 2:4 sparse models on vLLM and get significant speedupts on Hopper GPUs!
|
61 |
+
"""
|
62 |
+
|
63 |
+
prompt = f"Give a TL;DR of the following Reddit post.\n<|user|>{post}\nTL;DR:\n<|assistant|>\n"
|
64 |
+
|
65 |
+
completion = client.completions.create(
|
66 |
+
model="nm-testing/Sparse-Llama-3.1-8B-tldr-2of4",
|
67 |
+
prompt=prompt,
|
68 |
+
max_tokens=256,
|
69 |
+
)
|
70 |
+
print("Completion result:", completion)
|
71 |
+
```
|
72 |
+
|
73 |
+
## Training
|
74 |
+
|
75 |
<details><summary>See axolotl config</summary>
|
76 |
|
77 |
axolotl version: `0.10.0.dev0`
|
|
|
93 |
no_input_format: "<|user|>\n{instruction}\n<|assistant|>\n"
|
94 |
split: train
|
95 |
|
|
|
|
|
|
|
96 |
sequence_len: 4096
|
97 |
sample_packing: true
|
98 |
pad_to_sequence_len: true
|
|
|
163 |
start: 0
|
164 |
save_compressed: true
|
165 |
```
|
|
|
166 |
</details><br>
|
167 |
|
168 |
+
<details><summary>Training hyperparameters</summary>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
169 |
|
170 |
The following hyperparameters were used during training:
|
171 |
- learning_rate: 2e-05
|
|
|
181 |
- lr_scheduler_warmup_steps: 32
|
182 |
- num_epochs: 2.0
|
183 |
|
184 |
+
</details><br>
|
185 |
+
|
186 |
+
<details><summary>Training results</summary>
|
187 |
|
188 |
| Training Loss | Epoch | Step | Validation Loss |
|
189 |
|:-------------:|:------:|:----:|:---------------:|
|
|
|
196 |
| 1.6955 | 1.5046 | 492 | 1.8065 |
|
197 |
| 1.762 | 1.7554 | 574 | 1.8064 |
|
198 |
|
199 |
+
</details><br>
|
200 |
|
201 |
+
<details><summary>Framework versions</summary>
|
202 |
|
203 |
- Transformers 4.51.3
|
204 |
- Pytorch 2.7.0+cu126
|
205 |
- Datasets 3.5.1
|
206 |
- Tokenizers 0.21.1
|
207 |
+
|
208 |
+
</details><br>
|
209 |
+
|
210 |
+
## Evaluation
|
211 |
+
|
212 |
+
The model was evaluated on the test split of trl-lib/tldr using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/tldr) (tldr branch).
|
213 |
+
One can reproduce these results by using the following command:
|
214 |
+
|
215 |
+
```bash
|
216 |
+
lm_eval --model vllm --model_args "pretrained=nm-testing/Sparse-Llama-3.1-8B-tldr-2of4,dtype=auto,add_bos_token" --batch-size auto --tasks tldr
|
217 |
+
```
|
218 |
+
|
219 |
+
<table>
|
220 |
+
<tr>
|
221 |
+
<th>Metric
|
222 |
+
</th>
|
223 |
+
<th>Llama-3.1-8B
|
224 |
+
</th>
|
225 |
+
<th>Llama-3.1-8B-Instruct
|
226 |
+
</th>
|
227 |
+
<th>Llama-3.1-8B-tldr
|
228 |
+
</th>
|
229 |
+
<th>Sparse-Llama-3.1-8B-tldr<br>(this model)
|
230 |
+
</th>
|
231 |
+
</tr>
|
232 |
+
<tr>
|
233 |
+
<td>BERTScore
|
234 |
+
</td>
|
235 |
+
<td>0.087
|
236 |
+
</td>
|
237 |
+
<td>-0.230
|
238 |
+
</td>
|
239 |
+
<td>0.366
|
240 |
+
</td>
|
241 |
+
<td>0.366
|
242 |
+
</td>
|
243 |
+
</tr>
|
244 |
+
<tr>
|
245 |
+
<td>ROUGE-1
|
246 |
+
</td>
|
247 |
+
<td>0.187
|
248 |
+
</td>
|
249 |
+
<td>0.059
|
250 |
+
</td>
|
251 |
+
<td>0.362
|
252 |
+
</td>
|
253 |
+
<td>0.357
|
254 |
+
</td>
|
255 |
+
</tr>
|
256 |
+
<tr>
|
257 |
+
<td>ROUGE-2
|
258 |
+
</td>
|
259 |
+
<td>0.068
|
260 |
+
</td>
|
261 |
+
<td>0.018
|
262 |
+
</td>
|
263 |
+
<td>0.144
|
264 |
+
</td>
|
265 |
+
<td>0.141
|
266 |
+
</td>
|
267 |
+
</tr>
|
268 |
+
<tr>
|
269 |
+
<td>ROUGE-Lsum
|
270 |
+
</td>
|
271 |
+
<td>0.161
|
272 |
+
</td>
|
273 |
+
<td>0.051
|
274 |
+
</td>
|
275 |
+
<td>0.306
|
276 |
+
</td>
|
277 |
+
<td>0.304
|
278 |
+
</td>
|
279 |
+
</tr>
|
280 |
+
</table>
|