INC4AI commited on
Commit
52fa8af
·
verified ·
1 Parent(s): 75ad630

Update vllm eval results

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -12,6 +12,13 @@ Please follow the license of the original model.
12
 
13
  ## How To Use
14
 
 
 
 
 
 
 
 
15
  ### INT4 Inference
16
  ```python
17
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -130,6 +137,27 @@ autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
130
 
131
  ```
132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
  ## Ethical Considerations and Limitations
135
 
 
12
 
13
  ## How To Use
14
 
15
+ ### vLLM usage
16
+
17
+ ~~~bash
18
+ vllm serve Intel/DeepSeek-V3.1-int4-mixed-AutoRound
19
+ ~~~
20
+
21
+
22
  ### INT4 Inference
23
  ```python
24
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
137
 
138
  ```
139
 
140
+ ## Evaluate Results
141
+
142
+ | benchmark | backend | Intel/DeepSeek-V3.1-int4-mixed-AutoRound | deepseek-ai/DeepSeek-V3.1 |
143
+ | :-------: | :-----: | :--------------------------------------: | :-----------------------: |
144
+ | mmlu_pro | vllm | 0.7922 | 0.7965 |
145
+
146
+ ```
147
+ # key dependency version
148
+ torch 2.8.0
149
+ transformers 4.56.2
150
+ lm_eval 0.4.9.1
151
+ vllm 0.10.2rc3.dev291+g535d80056.precompiled
152
+
153
+ # eval cmd
154
+ CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
155
+ lm_eval --model vllm \
156
+ --model_args pretrained=Intel/DeepSeek-V3.1-int4-mixed-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
157
+ --tasks mmlu_pro \
158
+ --batch_size 4
159
+ ```
160
+
161
 
162
  ## Ethical Considerations and Limitations
163