Update README.md
Browse files
README.md
CHANGED
@@ -70,8 +70,10 @@ Perplexity (PPL) streaming evaluation on WikiText-2 (raw, test); fast preset wit
|
|
70 |
|----------------------|-----------------------|
|
71 |
| MLX 8-bit (gs=32) | 7.39 |
|
72 |
| MLX bf16 (reference) | 7.38 |
|
|
|
73 |
|
74 |
Notes:
|
|
|
75 |
- Results from local runs on Apple Silicon using MLX; numbers vary slightly with tokenizer details, logits dtype, and token subset.
|
76 |
- For more sensitive comparisons, use overlapping windows (e.g., `--stride 512`) and evaluate the full split.
|
77 |
|
@@ -89,6 +91,7 @@ python -m mlx_lm convert \
|
|
89 |
## Sibling & reference models
|
90 |
|
91 |
- halley-ai/gpt-oss-120b-MLX-bf16 (non-quantized reference)
|
|
|
92 |
|
93 |
## Limitations & biases
|
94 |
|
|
|
70 |
|----------------------|-----------------------|
|
71 |
| MLX 8-bit (gs=32) | 7.39 |
|
72 |
| MLX bf16 (reference) | 7.38 |
|
73 |
+
| MLX 6-bit (gs=64) | 7.40 |
|
74 |
|
75 |
Notes:
|
76 |
+
|
77 |
- Results from local runs on Apple Silicon using MLX; numbers vary slightly with tokenizer details, logits dtype, and token subset.
|
78 |
- For more sensitive comparisons, use overlapping windows (e.g., `--stride 512`) and evaluate the full split.
|
79 |
|
|
|
91 |
## Sibling & reference models
|
92 |
|
93 |
- halley-ai/gpt-oss-120b-MLX-bf16 (non-quantized reference)
|
94 |
+
- halley-ai/gpt-oss-120b-MLX-6bit-gs64 (smaller/faster variant)
|
95 |
|
96 |
## Limitations & biases
|
97 |
|