Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,7 @@ This model was obtained by quantizing the weights of [Qwen3-0.6B](https://huggin
|
|
37 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
|
38 |
|
39 |
Only the weights of the linear operators within transformers blocks are quantized.
|
40 |
-
Weights are quantized using a asymmetric per-group scheme, with group size
|
41 |
The [GPTQ](https://arxiv.org/abs/2210.17323) algorithm is applied for quantization, as implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
|
42 |
|
43 |
|
|
|
37 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
|
38 |
|
39 |
Only the weights of the linear operators within transformers blocks are quantized.
|
40 |
+
Weights are quantized using a asymmetric per-group scheme, with group size 64.
|
41 |
The [GPTQ](https://arxiv.org/abs/2210.17323) algorithm is applied for quantization, as implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
|
42 |
|
43 |
|