aurick commited on
Commit
896ac45
·
verified ·
1 Parent(s): b958523

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -7
README.md CHANGED
@@ -6,16 +6,12 @@ license: cc-by-nc-4.0
6
 
7
  Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
8
 
9
- <!--We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:
10
 
11
  | method | ShareGPT | HumanEval |
12
  |--------------------------------------|----------------|--------------|
13
- | VLLM V1 Baseline | 84.1 | 84.1 |
14
- | VLLM V1 Eagle | 102.2 | 112.0 |
15
- | VLLM V1 Eagle3 | 77.7 | 85.3 |
16
- | VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 |
17
- | ArcticSpeculator | **172.4** | **203.7** |
18
- -->
19
 
20
  For more details about ArcticSpeculator and how to use it:
21
 
 
6
 
7
  Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
8
 
9
+ Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:
10
 
11
  | method | ShareGPT | HumanEval |
12
  |--------------------------------------|----------------|--------------|
13
+ | vLLM V1 Baseline | 220.2 | 220.7 |
14
+ | ArcticSpeculator | **377.3** | **400.0** |
 
 
 
 
15
 
16
  For more details about ArcticSpeculator and how to use it:
17