Update README.md

27de11c verified 9 months ago

2.34 kB

metadata

license: apache-2.0
base_model: FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
tags:
  - mlx

13 TPS

27 TPS with Speculative decoding in LMstudio.

Draft model: DeepScaleR-1.5B-Preview-Q8

Macbook M4 Max: high power

system prompt: "You are Fuse01. You answer very direct brief and concise"

prompt: "Write a quick sort in C++"

Context: 131072, Temp: 0

Try this model in Visual Studio Code with the Roo Code extension. Starting in Architect Mode and letting it auto switch to Code Mode.... it actually spits decent code for small projects with multiple files. Near last years's Claude Sonnet for small projects. It actually stays reasonably stable even with Roo Code's huge 10k system prompt. The model still shits the bed for big projects but better after adding roo-code-memory-bank.

All the smaller quants I tested shit the bed

All the smaller models I tested shit the bed

So far (Feb 20, 2025) this is the only model & quant that runs fast on Mac, spits decent code AND works with Speculative Decoding.

Huge thanks to all who helped Macs get this far!

bobig/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-Q8

The Model bobig/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-Q8 was converted to MLX format from FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview using mlx-lm version 0.21.4. (FYI: the mlx-lm version should be the same in Base model and Draft model)

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("bobig/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-Q8")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Are you still reading down here? Really?

Maybe use your OCD super powers to try this new Q4 lossless quant compression and tell us how to improve mlx-lm to get 8-bit quality at 4-bit speed! https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant

ore