alexmarques commited on
Commit
3631931
·
verified ·
1 Parent(s): d6fe5b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -37,7 +37,7 @@ This model was obtained by quantizing the weights of [Qwen3-0.6B](https://huggin
37
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
38
 
39
  Only the weights of the linear operators within transformers blocks are quantized.
40
- Weights are quantized using a asymmetric per-group scheme, with group size 128.
41
  The [GPTQ](https://arxiv.org/abs/2210.17323) algorithm is applied for quantization, as implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
42
 
43
 
 
37
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
38
 
39
  Only the weights of the linear operators within transformers blocks are quantized.
40
+ Weights are quantized using a asymmetric per-group scheme, with group size 64.
41
  The [GPTQ](https://arxiv.org/abs/2210.17323) algorithm is applied for quantization, as implemented in the [llm-compressor](https://github.com/vllm-project/llm-compressor) library.
42
 
43