Distill Qwen3-Coder-480b-A35B over your Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill
#9
by
NIK2703
- opened
as my tests have shown, this model performs better than the basic model in programming tasks, but general thinking models such as Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill and gpt-oss-20b often perform better in complex problems than both of them. It would be interesting to see a combination of deepseek's thinking abilities and coder-480b's programming skills.