Huihui-gpt-oss-20b-BF16-abliterated - W8A16 Quantized Version

This is the W8A16 quantized version of the huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated model. The model has been quantized using LLM Compressor with the MOE-specific quantization approach.

Model Details

Usage

This quantized model can be used with vLLM and other inference frameworks that support the compressed-tensors format.

# Example usage with vLLM (if supported)
from vllm import LLM
model = LLM("huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-W8A16")
output = model.generate("My name is")

Quantization Process

The model was quantized using the MOE-specific approach in LLM Compressor, which preserves full precision for sensitive gate layers while quantizing the rest of the network to W8A16.

Benefits

  • Reduced model size compared to the BF16 version
  • Maintains good performance despite quantization
  • Compatible with vLLM for efficient inference

License

This model is licensed under the Apache 2.0 license, same as the original model.

Downloads last month
8
Safetensors
Model size
20.4B params
Tensor type
BF16
·
I64
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for groxaxo/Huihui-gpt-oss-20b-BF16-abliterated-W8A16