--- license: apache-2.0 language: - en - de - es - fr - ja - pt - ar - cs - it - ko - nl - zh base_model: - ibm-granite/granite-3.3-8b-instruct pipeline_tag: text-generation --- # granite-3.3-8b-instruct-FP8-Dynamic Model Card This model was optimized for use with VLLM on NVIDIA GPUs with compute capability > 8.0 (Ampere, A100, A10, 3090, etc.). It utilizes a weight-only FP8 Marlin kernel, providing an efficient W8A16 configuration. To quantize `llmcompressor 0.6.0.1` was used with the following recipe: ```python recipe = QuantizationModifier( targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]) ```