Update README.md
Browse files
README.md
CHANGED
@@ -6,16 +6,12 @@ license: cc-by-nc-4.0
|
|
6 |
|
7 |
Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
|
8 |
|
9 |
-
|
10 |
|
11 |
| method | ShareGPT | HumanEval |
|
12 |
|--------------------------------------|----------------|--------------|
|
13 |
-
|
|
14 |
-
|
|
15 |
-
| VLLM V1 Eagle3 | 77.7 | 85.3 |
|
16 |
-
| VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 |
|
17 |
-
| ArcticSpeculator | **172.4** | **203.7** |
|
18 |
-
-->
|
19 |
|
20 |
For more details about ArcticSpeculator and how to use it:
|
21 |
|
|
|
6 |
|
7 |
Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
|
8 |
|
9 |
+
Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:
|
10 |
|
11 |
| method | ShareGPT | HumanEval |
|
12 |
|--------------------------------------|----------------|--------------|
|
13 |
+
| vLLM V1 Baseline | 220.2 | 220.7 |
|
14 |
+
| ArcticSpeculator | **377.3** | **400.0** |
|
|
|
|
|
|
|
|
|
15 |
|
16 |
For more details about ArcticSpeculator and how to use it:
|
17 |
|