sayed0am
/

granite-3.3-8b-instruct-FP8-Dynamic

Text Generation

compressed-tensors

Model card Files Files and versions

granite-3.3-8b-instruct-FP8-Dynamic / README.md

sayed0am's picture

Create README.md

d534422 verified about 1 month ago

|

history blame contribute delete

679 Bytes

	---
	license: apache-2.0
	language:
	- en
	- de
	- es
	- fr
	- ja
	- pt
	- ar
	- cs
	- it
	- ko
	- nl
	- zh
	base_model:
	- ibm-granite/granite-3.3-8b-instruct
	pipeline_tag: text-generation
	---
	# granite-3.3-8b-instruct-FP8-Dynamic Model Card

	<!-- Provide a quick summary of what the model is/does. -->

	This model was optimized for use with VLLM on NVIDIA GPUs with compute capability > 8.0 (Ampere, A100, A10, 3090, etc.). It utilizes a weight-only FP8 Marlin kernel, providing an efficient W8A16 configuration.

	To quantize `llmcompressor 0.6.0.1` was used with the following recipe:
	```python
	recipe = QuantizationModifier(
	targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])
	```