Update README.md
Browse files
README.md
CHANGED
@@ -278,7 +278,7 @@ We evaluated the inference performance of this model using the first 1,000 sampl
|
|
278 |
Benchmarking was conducted with [vLLM](https://docs.vllm.ai/en/latest/) version `0.9.0.1` and [GuideLLM](https://github.com/neuralmagic/guidellm) version `0.2.1`.
|
279 |
|
280 |
The figure below presents the **mean end-to-end latency per request** across varying request rates.
|
281 |
-
Results are shown for this model, as well as
|
282 |
- **Dense:** [Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4)
|
283 |
- **Dense-quantized:** [Llama-3.1-8B-tldr-FP8-dynamic](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr-FP8-dynamic)
|
284 |
- **Sparse-quantized:** [Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic)
|
|
|
278 |
Benchmarking was conducted with [vLLM](https://docs.vllm.ai/en/latest/) version `0.9.0.1` and [GuideLLM](https://github.com/neuralmagic/guidellm) version `0.2.1`.
|
279 |
|
280 |
The figure below presents the **mean end-to-end latency per request** across varying request rates.
|
281 |
+
Results are shown for this model, as well as three variants:
|
282 |
- **Dense:** [Llama-3.1-8B-tldr](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4)
|
283 |
- **Dense-quantized:** [Llama-3.1-8B-tldr-FP8-dynamic](https://huggingface.co/RedHatAI/Llama-3.1-8B-tldr-FP8-dynamic)
|
284 |
- **Sparse-quantized:** [Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic](https://huggingface.co/RedHatAI/Sparse-Llama-3.1-8B-tldr-2of4-FP8-dynamic)
|