Huihui-gpt-oss-20b-BF16-abliterated - W8A16 Quantized Version

This is the W8A16 quantized version of the huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated model. The model has been quantized using LLM Compressor with the MOE-specific quantization approach.

Model Details

Original Model: huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated
Quantization Method: W8A16 (8-bit weights, 16-bit activations)
Quantization Tool: LLM Compressor
Format: safetensors with compressed-tensors format

Usage

This quantized model can be used with vLLM and other inference frameworks that support the compressed-tensors format.

# Example usage with vLLM (if supported)
from vllm import LLM
model = LLM("huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-W8A16")
output = model.generate("My name is")

Quantization Process

The model was quantized using the MOE-specific approach in LLM Compressor, which preserves full precision for sensitive gate layers while quantizing the rest of the network to W8A16.

Benefits

Reduced model size compared to the BF16 version
Maintains good performance despite quantization
Compatible with vLLM for efficient inference

License

This model is licensed under the Apache 2.0 license, same as the original model.