Huihui-gpt-oss-20b-BF16-abliterated - W8A16 Quantized Version
This is the W8A16 quantized version of the huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated
model. The model has been quantized using LLM Compressor with the MOE-specific quantization approach.
Model Details
- Original Model: huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated
- Quantization Method: W8A16 (8-bit weights, 16-bit activations)
- Quantization Tool: LLM Compressor
- Format: safetensors with compressed-tensors format
Usage
This quantized model can be used with vLLM and other inference frameworks that support the compressed-tensors format.
# Example usage with vLLM (if supported)
from vllm import LLM
model = LLM("huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-W8A16")
output = model.generate("My name is")
Quantization Process
The model was quantized using the MOE-specific approach in LLM Compressor, which preserves full precision for sensitive gate layers while quantizing the rest of the network to W8A16.
Benefits
- Reduced model size compared to the BF16 version
- Maintains good performance despite quantization
- Compatible with vLLM for efficient inference
License
This model is licensed under the Apache 2.0 license, same as the original model.
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for groxaxo/Huihui-gpt-oss-20b-BF16-abliterated-W8A16
Base model
openai/gpt-oss-20b
Finetuned
unsloth/gpt-oss-20b-BF16