File size: 679 Bytes
d534422
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
license: apache-2.0
language:
- en
- de
- es
- fr
- ja
- pt
- ar
- cs
- it
- ko
- nl
- zh
base_model:
- ibm-granite/granite-3.3-8b-instruct
pipeline_tag: text-generation
---
# granite-3.3-8b-instruct-FP8-Dynamic Model Card

<!-- Provide a quick summary of what the model is/does. -->

This model was optimized for use with VLLM on NVIDIA GPUs with compute capability > 8.0 (Ampere, A100, A10, 3090, etc.). It utilizes a weight-only FP8 Marlin kernel, providing an efficient W8A16 configuration.

To quantize `llmcompressor 0.6.0.1` was used with the following recipe:
```python
recipe = QuantizationModifier(
  targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])
```