RedHatAI
/

Sparse-Llama-3.1-8B-tldr-2of4

Text Generation

Generated from Trainer

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on Jun 6

Commit

f8d85de

·

verified ·

1 Parent(s): e9c5a82

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -278,7 +278,7 @@ We evaluated the inference performance of this model using the first 1,000 sampl
 Benchmarking was conducted with [vLLM](https://docs.vllm.ai/en/latest/) version `0.9.0.1` and [GuideLLM](https://github.com/neuralmagic/guidellm) version `0.2.1`.
 The figure below presents the **mean end-to-end latency per request** across varying request rates.
-Results are shown for this model, as well as two variants:
 - **Dense:** [Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4)
 - **Dense-quantized:** [Llama-3.1-8B-tldr-FP8-dynamic](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr-FP8-dynamic)
 - **Sparse-quantized:** [Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic)

 Benchmarking was conducted with [vLLM](https://docs.vllm.ai/en/latest/) version `0.9.0.1` and [GuideLLM](https://github.com/neuralmagic/guidellm) version `0.2.1`.
 The figure below presents the **mean end-to-end latency per request** across varying request rates.
+Results are shown for this model, as well as three variants:
 - **Dense:** [Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4)
 - **Dense-quantized:** [Llama-3.1-8B-tldr-FP8-dynamic](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr-FP8-dynamic)
 - **Sparse-quantized:** [Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic)