ekurtic commited on
Commit
4cd653b
·
verified ·
1 Parent(s): ab359a8

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -3
README.md CHANGED
@@ -1,3 +1,68 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - mistralai/Devstral-Small-2507
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - mistral
9
+ - neuralmagic
10
+ - redhat
11
+ - llmcompressor
12
+ - quantized
13
+ - FP8
14
+ - compressed-tensors
15
+ license: mit
16
+ license_name: mit
17
+ name: RedHatAI/Devstral-Small-2507
18
+ description: This model was obtained by quantizing weights and activations of Devstral-Small-2507 to FP8 data type.
19
+ readme: https://huggingface.co/RedHatAI/Devstral-Small-2507-FP8-Dynamic/main/README.md
20
+ tasks:
21
+ - text-to-text
22
+ provider: mistralai
23
+ ---
24
+
25
+ # Devstral-Small-2507-FP8-Dynamic
26
+
27
+ ## Model Overview
28
+ - **Model Architecture:** MistralForCausalLM
29
+ - **Input:** Text
30
+ - **Output:** Text
31
+ - **Model Optimizations:**
32
+ - **Activation quantization:** FP8
33
+ - **Weight quantization:** FP8
34
+ - **Release Date:** 08/28/2025
35
+ - **Version:** 1.0
36
+ - **Model Developers:** Red Hat (Neural Magic)
37
+
38
+
39
+ ### Model Optimizations
40
+
41
+ This model was obtained by quantizing weights and activations of [Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507) to FP8 data type.
42
+ This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%).
43
+ Weight quantization also reduces disk size requirements by approximately 50%.
44
+
45
+
46
+ ## Deployment
47
+
48
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
49
+
50
+ ```bash
51
+ vllm serve RedHatAI/Devstral-Small-2507-FP8-Dynamic --tensor-parallel-size 1 --tokenizer_mode mistral
52
+ ```
53
+
54
+ ## Evaluation
55
+
56
+ The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via [EvalPlus](https://github.com/evalplus/evalplus) and vllm backend (v0.10.1.1).
57
+ For evaluations, we run greedy sampling and report pass@1
58
+
59
+
60
+ ### Accuracy
61
+
62
+ | | Recovery (%) | mistralai/Devstral-Small-2507 | RedHatAI/Devstral-Small-2507-FP8-Dynamic<br>(this model) |
63
+ | --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
64
+ | HumanEval | 98.50 | 89.0 | 89.6 |
65
+ | HumanEval+ | 99.88 | 81.1 | 82.9 |
66
+ | MBPP | 101.21 | 77.5 | 75.4 |
67
+ | MBPP+ | 101.21 | 66.1 | 64.8 |
68
+ | **Average Score** | **99.68** | **78.43** | **78.18** |