Update vllm eval results
Browse files
README.md
CHANGED
@@ -12,6 +12,13 @@ Please follow the license of the original model.
|
|
12 |
|
13 |
## How To Use
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
### INT4 Inference
|
16 |
```python
|
17 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
@@ -130,6 +137,27 @@ autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
|
130 |
|
131 |
```
|
132 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
133 |
|
134 |
## Ethical Considerations and Limitations
|
135 |
|
|
|
12 |
|
13 |
## How To Use
|
14 |
|
15 |
+
### vLLM usage
|
16 |
+
|
17 |
+
~~~bash
|
18 |
+
vllm serve Intel/DeepSeek-V3.1-int4-mixed-AutoRound
|
19 |
+
~~~
|
20 |
+
|
21 |
+
|
22 |
### INT4 Inference
|
23 |
```python
|
24 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
137 |
|
138 |
```
|
139 |
|
140 |
+
## Evaluate Results
|
141 |
+
|
142 |
+
| benchmark | backend | Intel/DeepSeek-V3.1-int4-mixed-AutoRound | deepseek-ai/DeepSeek-V3.1 |
|
143 |
+
| :-------: | :-----: | :--------------------------------------: | :-----------------------: |
|
144 |
+
| mmlu_pro | vllm | 0.7922 | 0.7965 |
|
145 |
+
|
146 |
+
```
|
147 |
+
# key dependency version
|
148 |
+
torch 2.8.0
|
149 |
+
transformers 4.56.2
|
150 |
+
lm_eval 0.4.9.1
|
151 |
+
vllm 0.10.2rc3.dev291+g535d80056.precompiled
|
152 |
+
|
153 |
+
# eval cmd
|
154 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
|
155 |
+
lm_eval --model vllm \
|
156 |
+
--model_args pretrained=Intel/DeepSeek-V3.1-int4-mixed-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
|
157 |
+
--tasks mmlu_pro \
|
158 |
+
--batch_size 4
|
159 |
+
```
|
160 |
+
|
161 |
|
162 |
## Ethical Considerations and Limitations
|
163 |
|