RedHatAI
/

Sparse-Llama-3.1-8B-tldr-2of4

@@ -6,15 +6,72 @@ tags:
 - generated_from_trainer
 datasets:
 - trl-lib/tldr
-model-index:
-- name: mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 <details><summary>See axolotl config</summary>
 axolotl version: `0.10.0.dev0`
@@ -36,9 +93,6 @@ datasets:
       no_input_format: "<|user|>\n{instruction}\n<|assistant|>\n"
     split: train
-dataset_prepared_path: /mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/last_run_prepared
-output_dir: /mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
 sequence_len: 4096
 sample_packing: true
 pad_to_sequence_len: true
@@ -109,30 +163,9 @@ llmcompressor:
           start: 0
   save_compressed: true
 ```
 </details><br>
-# mnt/nvme2/alexandre/spft/sparse/lr2e-5_ep2_norm3/Sparse-Llama-3.1-8B-2of4-tldr
-This model is a fine-tuned version of [RedHatAI/Sparse-Llama-3.1-8B-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-2of4) on the trl-lib/tldr dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.8064
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
@@ -148,7 +181,9 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_steps: 32
 - num_epochs: 2.0
-### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
@@ -161,10 +196,85 @@ The following hyperparameters were used during training:
 | 1.6955        | 1.5046 | 492  | 1.8065          |
 | 1.762         | 1.7554 | 574  | 1.8064          |
-### Framework versions
 - Transformers 4.51.3
 - Pytorch 2.7.0+cu126
 - Datasets 3.5.1
 - Tokenizers 0.21.1

 - generated_from_trainer
 datasets:
 - trl-lib/tldr
 ---
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+# Sparse-Llama-3.1-8B-tldr-2of4
+## Model Overview
+- **Model Architecture:** LlamaForCausalLM
+  - **Input:** Text
+  - **Output:** Text
+- **Model Optimizations:**
+  - **Sparsity:** 2:4
+- **Release Date:** 05/29/2025
+- **Version:** 1.0
+- **Intended Use Cases:** This model is finetuned to summarize text in the style of Reddit posts.
+- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License.
+- **Model Developers:** Red Hat (Neural Magic)
+This model is a fine-tuned version of [RedHatAI/Sparse-Llama-3.1-8B-2of4](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-2of4) on the trl-lib/tldr dataset.
+## Deployment
+This model can be deployed efficiently using [vLLM](https://docs.vllm.ai/en/latest/), as shown in the example below.
+Run the following command to start the vLLM server:
+```bash
+vllm serve nm-testing/Sparse-Llama-3.1-8B-tldr-2of4
+```
+Once your server is started, you can query the model using the OpenAI API:
+```python
+from openai import OpenAI
+# Modify OpenAI's API key and API base to use vLLM's API server.
+openai_api_key = "EMPTY"
+openai_api_base = "http://localhost:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+post="""
+SUBREDDIT: r/AI
+TITLE: Training sparse LLMs
+POST: Now you can use the llm-compressor integration to axolotl to train sparse LLMs!
+It's super easy to use. See the example in https://huggingface.co/nm-testing/Sparse-Llama-3.1-8B-tldr-2of4.
+And there's more. You can run 2:4 sparse models on vLLM and get significant speedupts on Hopper GPUs!
+"""
+prompt = f"Give a TL;DR of the following Reddit post.\n<|user|>{post}\nTL;DR:\n<|assistant|>\n"
+completion = client.completions.create(
+  model="nm-testing/Sparse-Llama-3.1-8B-tldr-2of4",
+  prompt=prompt,
+  max_tokens=256,
+)
+print("Completion result:", completion)
+```
+## Training
 <details><summary>See axolotl config</summary>
 axolotl version: `0.10.0.dev0`
       no_input_format: "<|user|>\n{instruction}\n<|assistant|>\n"
     split: train
 sequence_len: 4096
 sample_packing: true
 pad_to_sequence_len: true
           start: 0
   save_compressed: true
 ```
 </details><br>
+<details><summary>Training hyperparameters</summary>
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
 - lr_scheduler_warmup_steps: 32
 - num_epochs: 2.0
+</details><br>
+<details><summary>Training results</summary>
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.6955        | 1.5046 | 492  | 1.8065          |
 | 1.762         | 1.7554 | 574  | 1.8064          |
+</details><br>
+<details><summary>Framework versions</summary>
 - Transformers 4.51.3
 - Pytorch 2.7.0+cu126
 - Datasets 3.5.1
 - Tokenizers 0.21.1
+</details><br>
+## Evaluation
+The model was evaluated on the test split of trl-lib/tldr using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/tldr) (tldr branch).
+One can reproduce these results by using the following command:
+```bash
+lm_eval --model vllm --model_args "pretrained=nm-testing/Sparse-Llama-3.1-8B-tldr-2of4,dtype=auto,add_bos_token" --batch-size auto --tasks tldr
+```
+<table>
+  <tr>
+   <th>Metric
+   </th>
+   <th>Llama-3.1-8B
+   </th>
+   <th>Llama-3.1-8B-Instruct
+   </th>
+   <th>Llama-3.1-8B-tldr
+   </th>
+   <th>Sparse-Llama-3.1-8B-tldr<br>(this model)
+   </th>
+  </tr>
+  <tr>
+   <td>BERTScore
+   </td>
+   <td>0.087
+   </td>
+   <td>-0.230
+   </td>
+   <td>0.366
+   </td>
+   <td>0.366
+   </td>
+  </tr>
+  <tr>
+   <td>ROUGE-1
+   </td>
+   <td>0.187
+   </td>
+   <td>0.059
+   </td>
+   <td>0.362
+   </td>
+   <td>0.357
+   </td>
+  </tr>
+  <tr>
+   <td>ROUGE-2
+   </td>
+   <td>0.068
+   </td>
+   <td>0.018
+   </td>
+   <td>0.144
+   </td>
+   <td>0.141
+   </td>
+  </tr>
+  <tr>
+   <td>ROUGE-Lsum
+   </td>
+   <td>0.161
+   </td>
+   <td>0.051
+   </td>
+   <td>0.306
+   </td>
+   <td>0.304
+   </td>
+  </tr>
+</table>