Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -70,8 +70,10 @@ Perplexity (PPL) streaming evaluation on WikiText-2 (raw, test); fast preset wit
 |----------------------|-----------------------|
 | MLX 8-bit (gs=32)    | 7.39                  |
 | MLX bf16 (reference) | 7.38                  |
 Notes:
 - Results from local runs on Apple Silicon using MLX; numbers vary slightly with tokenizer details, logits dtype, and token subset.
 - For more sensitive comparisons, use overlapping windows (e.g., `--stride 512`) and evaluate the full split.
@@ -89,6 +91,7 @@ python -m mlx_lm convert \
 ## Sibling & reference models
 - halley-ai/gpt-oss-120b-MLX-bf16 (non-quantized reference)
 ## Limitations & biases

 |----------------------|-----------------------|
 | MLX 8-bit (gs=32)    | 7.39                  |
 | MLX bf16 (reference) | 7.38                  |
+| MLX 6-bit (gs=64)    | 7.40                  |
 Notes:
 - Results from local runs on Apple Silicon using MLX; numbers vary slightly with tokenizer details, logits dtype, and token subset.
 - For more sensitive comparisons, use overlapping windows (e.g., `--stride 512`) and evaluate the full split.
 ## Sibling & reference models
 - halley-ai/gpt-oss-120b-MLX-bf16 (non-quantized reference)
+- halley-ai/gpt-oss-120b-MLX-6bit-gs64 (smaller/faster variant)
 ## Limitations & biases