Question About Model Quantization and Pruning Approach
#1
by
RO-KO
- opened
Hi there,
I came across your impressive project and had a quick question out of curiosity. I noticed that the original model is in bfloat16 (bf16), but the quantized version appears to be in float16 (fp16). I was wondering if pruning was part of your quantization process, or if there's another reason for this change in precision format.
Would you mind sharing a bit more detail about your quantization methodology or overall workflow? I'd really appreciate any insights you could offer.
Thanks in advance, and great work again!