inferencerlabs's picture
Upload complete model
c3620d0 verified
|
raw
history blame
838 Bytes
---
license: apache-2.0
pipeline_tag: text-generation
library_name: mlx
tags:
- vllm
- mlx
base_model: openai/gpt-oss-120b
---
**See gpt-oss-120b 6.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)**
*q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8 perplexity (1.128).*
| Quantization | Perplexity |
|:------------:|:----------:|
| **q2** | 41.293 |
| **q3** | 1.900 |
| **q4** | 1.168 |
| **q6** | 1.128 |
| **q8** | 1.128 |
## Usage Notes
* Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
* Memory usage: ~95 GB
* Expect ~60 tokens/s
* For more details see [demonstration video](https://youtube.com/xcreate) or visit [Open AI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-120b).