RedHatAI
/

Sparse-Llama-3.1-8B-tldr-2of4

Text Generation

Generated from Trainer

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on Jun 6

Commit

e9c5a82

·

verified ·

1 Parent(s): de2840f

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -282,6 +282,7 @@ Results are shown for this model, as well as two variants:
 - **Dense:** [Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4)
 - **Dense-quantized:** [Llama-3.1-8B-tldr-FP8-dynamic](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr-FP8-dynamic)
 - **Sparse-quantized:** [Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic)
 Although sparsity by itself does not significantly improve performance, when combined with quantization it results in up to 1.6x speedup.

 - **Dense:** [Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4)
 - **Dense-quantized:** [Llama-3.1-8B-tldr-FP8-dynamic](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr-FP8-dynamic)
 - **Sparse-quantized:** [Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic)
 Although sparsity by itself does not significantly improve performance, when combined with quantization it results in up to 1.6x speedup.