RedHatAI
/

Qwen3-0.6B-quantized.w4a16

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

alexmarques commited on May 12

Commit

3631931

·

verified ·

1 Parent(s): d6fe5b6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ This model was obtained by quantizing the weights of [Qwen3-0.6B](https://huggin
 This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
 Only the weights of the linear operators within transformers blocks are quantized.
-Weights are quantized using a asymmetric per-group scheme, with group size 128.
 The [GPTQ](https://arxiv.org/abs/2210.17323) algorithm is applied for quantization, as implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.

 This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
 Only the weights of the linear operators within transformers blocks are quantized.
+Weights are quantized using a asymmetric per-group scheme, with group size 64.
 The [GPTQ](https://arxiv.org/abs/2210.17323) algorithm is applied for quantization, as implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.