Snowflake
/

Arctic-LSTM-Speculator-gpt-oss-120b

Model card Files Files and versions

aurick commited on 19 days ago

Commit

896ac45

·

verified ·

1 Parent(s): b958523

Update README.md

Files changed (1) hide show

README.md +3 -7

README.md CHANGED Viewed

@@ -6,16 +6,12 @@ license: cc-by-nc-4.0
 Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
-<!--We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:
 | method                                 | ShareGPT      | HumanEval |
 |--------------------------------------|----------------|--------------|
-| VLLM V1 Baseline      | 84.1 | 84.1    |
-| VLLM V1 Eagle | 102.2   | 112.0    |
-| VLLM V1 Eagle3  | 77.7   | 85.3 |
-| VLLM V0 MLP-Speculator (IBM) | 77.9   | 66.7        |
-| ArcticSpeculator                          | **172.4**   | **203.7**    |
--->
 For more details about ArcticSpeculator and how to use it:

 Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
+Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:
 | method                                 | ShareGPT      | HumanEval |
 |--------------------------------------|----------------|--------------|
+| vLLM V1 Baseline      | 220.2 | 220.7    |
+| ArcticSpeculator                          | **377.3**   | **400.0**    |
 For more details about ArcticSpeculator and how to use it: