granite-3.3-8b-instruct-FP8-Dynamic Model Card
This model was optimized for use with VLLM on NVIDIA GPUs with compute capability > 8.0 (Ampere, A100, A10, 3090, etc.). It utilizes a weight-only FP8 Marlin kernel, providing an efficient W8A16 configuration.
To quantize llmcompressor 0.6.0.1
was used with the following recipe:
recipe = QuantizationModifier(
targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])
- Downloads last month
- 53
Model tree for sayed0am/granite-3.3-8b-instruct-FP8-Dynamic
Base model
ibm-granite/granite-3.3-8b-base
Finetuned
ibm-granite/granite-3.3-8b-instruct