Create README.md

d534422 verified about 1 month ago

679 Bytes

metadata

license: apache-2.0
language:
  - en
  - de
  - es
  - fr
  - ja
  - pt
  - ar
  - cs
  - it
  - ko
  - nl
  - zh
base_model:
  - ibm-granite/granite-3.3-8b-instruct
pipeline_tag: text-generation

granite-3.3-8b-instruct-FP8-Dynamic Model Card

This model was optimized for use with VLLM on NVIDIA GPUs with compute capability > 8.0 (Ampere, A100, A10, 3090, etc.). It utilizes a weight-only FP8 Marlin kernel, providing an efficient W8A16 configuration.

To quantize llmcompressor 0.6.0.1 was used with the following recipe:

recipe = QuantizationModifier(
  targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])