alexmarques commited on
Commit
c05a64c
·
verified ·
1 Parent(s): 0253769

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -33
README.md CHANGED
@@ -6,15 +6,72 @@ tags:
6
  - generated_from_trainer
7
  datasets:
8
  - trl-lib/tldr
9
- model-index:
10
- - name: mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
11
- results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  <details><summary>See axolotl config</summary>
19
 
20
  axolotl version: `0.10.0.dev0`
@@ -36,9 +93,6 @@ datasets:
36
  no_input_format: "<|user|>\n{instruction}\n<|assistant|>\n"
37
  split: train
38
 
39
- dataset_prepared_path: /mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/last_run_prepared
40
- output_dir: /mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
41
-
42
  sequence_len: 4096
43
  sample_packing: true
44
  pad_to_sequence_len: true
@@ -109,30 +163,9 @@ llmcompressor:
109
  start: 0
110
  save_compressed: true
111
  ```
112
-
113
  </details><br>
114
 
115
- # mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
116
-
117
- This model is a fine-tuned version of [RedHatAI/Sparse-Llama-3.1-8B-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-2of4) on the trl-lib/tldr dataset.
118
- It achieves the following results on the evaluation set:
119
- - Loss: 1.8064
120
-
121
- ## Model description
122
-
123
- More information needed
124
-
125
- ## Intended uses & limitations
126
-
127
- More information needed
128
-
129
- ## Training and evaluation data
130
-
131
- More information needed
132
-
133
- ## Training procedure
134
-
135
- ### Training hyperparameters
136
 
137
  The following hyperparameters were used during training:
138
  - learning_rate: 2e-05
@@ -148,7 +181,9 @@ The following hyperparameters were used during training:
148
  - lr_scheduler_warmup_steps: 32
149
  - num_epochs: 2.0
150
 
151
- ### Training results
 
 
152
 
153
  | Training Loss | Epoch | Step | Validation Loss |
154
  |:-------------:|:------:|:----:|:---------------:|
@@ -161,10 +196,85 @@ The following hyperparameters were used during training:
161
  | 1.6955 | 1.5046 | 492 | 1.8065 |
162
  | 1.762 | 1.7554 | 574 | 1.8064 |
163
 
 
164
 
165
- ### Framework versions
166
 
167
  - Transformers 4.51.3
168
  - Pytorch 2.7.0+cu126
169
  - Datasets 3.5.1
170
  - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - generated_from_trainer
7
  datasets:
8
  - trl-lib/tldr
 
 
 
9
  ---
10
 
 
 
 
11
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
12
+
13
+ # Sparse-Llama-3.1-8B-tldr-2of4
14
+
15
+ ## Model Overview
16
+ - **Model Architecture:** LlamaForCausalLM
17
+ - **Input:** Text
18
+ - **Output:** Text
19
+ - **Model Optimizations:**
20
+ - **Sparsity:** 2:4
21
+ - **Release Date:** 05/29/2025
22
+ - **Version:** 1.0
23
+ - **Intended Use Cases:** This model is finetuned to summarize text in the style of Reddit posts.
24
+ - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License.
25
+ - **Model Developers:** Red Hat (Neural Magic)
26
+
27
+ This model is a fine-tuned version of [RedHatAI/Sparse-Llama-3.1-8B-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-2of4) on the trl-lib/tldr dataset.
28
+
29
+ ## Deployment
30
+
31
+ This model can be deployed efficiently using [vLLM](https://docs.vllm.ai/en/latest/), as shown in the example below.
32
+
33
+ Run the following command to start the vLLM server:
34
+ ```bash
35
+ vllm serve nm-testing/Sparse-Llama-3.1-8B-tldr-2of4
36
+ ```
37
+
38
+ Once your server is started, you can query the model using the OpenAI API:
39
+
40
+ ```python
41
+ from openai import OpenAI
42
+
43
+ # Modify OpenAI's API key and API base to use vLLM's API server.
44
+ openai_api_key = "EMPTY"
45
+ openai_api_base = "http://localhost:8000/v1"
46
+ client = OpenAI(
47
+ api_key=openai_api_key,
48
+ base_url=openai_api_base,
49
+ )
50
+
51
+ post="""
52
+ SUBREDDIT: r/AI
53
+
54
+ TITLE: Training sparse LLMs
55
+
56
+ POST: Now you can use the llm-compressor integration to axolotl to train sparse LLMs!
57
+
58
+ It's super easy to use. See the example in https://huggingface.co/nm-testing/Sparse-Llama-3.1-8B-tldr-2of4.
59
+
60
+ And there's more. You can run 2:4 sparse models on vLLM and get significant speedupts on Hopper GPUs!
61
+ """
62
+
63
+ prompt = f"Give a TL;DR of the following Reddit post.\n<|user|>{post}\nTL;DR:\n<|assistant|>\n"
64
+
65
+ completion = client.completions.create(
66
+ model="nm-testing/Sparse-Llama-3.1-8B-tldr-2of4",
67
+ prompt=prompt,
68
+ max_tokens=256,
69
+ )
70
+ print("Completion result:", completion)
71
+ ```
72
+
73
+ ## Training
74
+
75
  <details><summary>See axolotl config</summary>
76
 
77
  axolotl version: `0.10.0.dev0`
 
93
  no_input_format: "<|user|>\n{instruction}\n<|assistant|>\n"
94
  split: train
95
 
 
 
 
96
  sequence_len: 4096
97
  sample_packing: true
98
  pad_to_sequence_len: true
 
163
  start: 0
164
  save_compressed: true
165
  ```
 
166
  </details><br>
167
 
168
+ <details><summary>Training hyperparameters</summary>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
 
170
  The following hyperparameters were used during training:
171
  - learning_rate: 2e-05
 
181
  - lr_scheduler_warmup_steps: 32
182
  - num_epochs: 2.0
183
 
184
+ </details><br>
185
+
186
+ <details><summary>Training results</summary>
187
 
188
  | Training Loss | Epoch | Step | Validation Loss |
189
  |:-------------:|:------:|:----:|:---------------:|
 
196
  | 1.6955 | 1.5046 | 492 | 1.8065 |
197
  | 1.762 | 1.7554 | 574 | 1.8064 |
198
 
199
+ </details><br>
200
 
201
+ <details><summary>Framework versions</summary>
202
 
203
  - Transformers 4.51.3
204
  - Pytorch 2.7.0+cu126
205
  - Datasets 3.5.1
206
  - Tokenizers 0.21.1
207
+
208
+ </details><br>
209
+
210
+ ## Evaluation
211
+
212
+ The model was evaluated on the test split of trl-lib/tldr using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/tldr) (tldr branch).
213
+ One can reproduce these results by using the following command:
214
+
215
+ ```bash
216
+ lm_eval --model vllm --model_args "pretrained=nm-testing/Sparse-Llama-3.1-8B-tldr-2of4,dtype=auto,add_bos_token" --batch-size auto --tasks tldr
217
+ ```
218
+
219
+ <table>
220
+ <tr>
221
+ <th>Metric
222
+ </th>
223
+ <th>Llama-3.1-8B
224
+ </th>
225
+ <th>Llama-3.1-8B-Instruct
226
+ </th>
227
+ <th>Llama-3.1-8B-tldr
228
+ </th>
229
+ <th>Sparse-Llama-3.1-8B-tldr<br>(this model)
230
+ </th>
231
+ </tr>
232
+ <tr>
233
+ <td>BERTScore
234
+ </td>
235
+ <td>0.087
236
+ </td>
237
+ <td>-0.230
238
+ </td>
239
+ <td>0.366
240
+ </td>
241
+ <td>0.366
242
+ </td>
243
+ </tr>
244
+ <tr>
245
+ <td>ROUGE-1
246
+ </td>
247
+ <td>0.187
248
+ </td>
249
+ <td>0.059
250
+ </td>
251
+ <td>0.362
252
+ </td>
253
+ <td>0.357
254
+ </td>
255
+ </tr>
256
+ <tr>
257
+ <td>ROUGE-2
258
+ </td>
259
+ <td>0.068
260
+ </td>
261
+ <td>0.018
262
+ </td>
263
+ <td>0.144
264
+ </td>
265
+ <td>0.141
266
+ </td>
267
+ </tr>
268
+ <tr>
269
+ <td>ROUGE-Lsum
270
+ </td>
271
+ <td>0.161
272
+ </td>
273
+ <td>0.051
274
+ </td>
275
+ <td>0.306
276
+ </td>
277
+ <td>0.304
278
+ </td>
279
+ </tr>
280
+ </table>