Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -54,15 +54,21 @@ vllm serve RedHatAI/Devstral-Small-2507-FP8-Dynamic --tensor-parallel-size 1 --t
|
|
54 |
## Evaluation
|
55 |
|
56 |
The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
|
57 |
-
For evaluations, we run greedy sampling and report pass@1
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
|
60 |
### Accuracy
|
61 |
|
62 |
| | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
|
63 |
| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
|
64 |
-
| HumanEval |
|
65 |
-
| HumanEval+ |
|
66 |
-
| MBPP |
|
67 |
-
| MBPP+ |
|
68 |
| **Average Score** | **99.68** | **78.43** | **78.18** |
|
|
|
54 |
## Evaluation
|
55 |
|
56 |
The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
|
57 |
+
For evaluations, we run greedy sampling and report pass@1. The command to reproduce evals:
|
58 |
+
```bash
|
59 |
+
evalplus.evaluate --model "RedHatAI/Devstral-Small-2507-FP8-Dynamic" \
|
60 |
+
--dataset [humaneval|mbpp] \
|
61 |
+
--base-url http://localhost:8000/v1 \
|
62 |
+
--backend openai --greedy
|
63 |
+
```
|
64 |
|
65 |
|
66 |
### Accuracy
|
67 |
|
68 |
| | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
|
69 |
| --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
|
70 |
+
| HumanEval | 100.67 | 89.0 | 89.6 |
|
71 |
+
| HumanEval+ | 102.22 | 81.1 | 82.9 |
|
72 |
+
| MBPP | 97.29 | 77.5 | 75.4 |
|
73 |
+
| MBPP+ | 98.03 | 66.1 | 64.8 |
|
74 |
| **Average Score** | **99.68** | **78.43** | **78.18** |
|