ekurtic commited on
Commit
42a06be
·
verified ·
1 Parent(s): 4cd653b

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -54,15 +54,21 @@ vllm serve RedHatAI/Devstral-Small-2507-FP8-Dynamic --tensor-parallel-size 1 --t
54
  ## Evaluation
55
 
56
  The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
57
- For evaluations, we run greedy sampling and report pass@1
 
 
 
 
 
 
58
 
59
 
60
  ### Accuracy
61
 
62
  | | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
63
  | --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
64
- | HumanEval | 98.50 | 89.0 | 89.6 |
65
- | HumanEval+ | 99.88 | 81.1 | 82.9 |
66
- | MBPP | 101.21 | 77.5 | 75.4 |
67
- | MBPP+ | 101.21 | 66.1 | 64.8 |
68
  | **Average Score** | **99.68** | **78.43** | **78.18** |
 
54
  ## Evaluation
55
 
56
  The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
57
+ For evaluations, we run greedy sampling and report pass@1. The command to reproduce evals:
58
+ ```bash
59
+ evalplus.evaluate --model "RedHatAI/Devstral-Small-2507-FP8-Dynamic" \
60
+ --dataset [humaneval|mbpp] \
61
+ --base-url http://localhost:8000/v1 \
62
+ --backend openai --greedy
63
+ ```
64
 
65
 
66
  ### Accuracy
67
 
68
  | | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
69
  | --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
70
+ | HumanEval | 100.67 | 89.0 | 89.6 |
71
+ | HumanEval+ | 102.22 | 81.1 | 82.9 |
72
+ | MBPP | 97.29 | 77.5 | 75.4 |
73
+ | MBPP+ | 98.03 | 66.1 | 64.8 |
74
  | **Average Score** | **99.68** | **78.43** | **78.18** |