File size: 12,350 Bytes
32b912b 5465241 f777bd1 32b912b 5465241 32b912b 5465241 c5fe5e4 32b912b 5465241 1f7d2c4 c57f39d c773ec8 5465241 32b912b 5465241 32b912b 5465241 32b912b 5465241 32b912b 5465241 32b912b 6ee2012 32fc237 6ee2012 03eed23 32b912b 5465241 32b912b 5465241 bb59ee5 5465241 32b912b 5465241 32b912b 5465241 32b912b 5465241 32b912b 5465241 32b912b 5465241 32b912b 96c7d01 bb59ee5 96c7d01 f9e3ffe 96c7d01 32b912b 5465241 32b912b 5465241 32b912b 6ee2012 32b912b 5465241 32b912b 5465241 32b912b 5465241 32b912b 5465241 1f7d2c4 c57f39d c773ec8 6159f06 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen3-32B
tags:
- agent
- open-source
- miromind
new_version: miromind-ai/MiroThinker-32B-SFT-v0.2
---
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/68525b342230a897a65cc1c0/87mYQ_a-4jpnMkVR4hrgm.png" width="55%" alt="MiroThinker" />
</div>
<!-- <hr> -->
<div align="center">
[](https://dr.miromind.ai/)
[](https://huggingface.co/collections/miromind-ai/mirothinker-v01-689301b6d0563321862d44a1)
[](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1)
[](https://miromind.ai/blog/miromind-research-agent)
[](https://github.com/MiroMindAI/MiroThinker)
[](https://discord.com/invite/GPqEnkzQZd)
[](https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/wechat.png)
[](https://www.xiaohongshu.com/user/profile/5e353bd80000000001000239)
[](https://miromind.ai/)
</div>
## Introduction
MiroThinker is an open-source agentic model series built on top of Qwen3. Designed for deep research and complex, long-horizon problem solving, it integrates strong capabilities in task decomposition, multi-hop reasoning, retrieval-augmented generation, code execution, web browsing, and document/file processing, making it suitable for a wide range of real-world applications.
We have released the MiroThinker-v0.1 series, including both SFT and DPO variants at parameter scales of 8B, 14B, and 32B. Notably, MiroThinker v0.1 achieves state-of-the-art performance among open-source models on the [GAIA benchmark](https://huggingface.co/datasets/gaia-benchmark/GAIA), a rigorous evaluation suite for advanced agentic capabilities, demonstrating its strength in long-context, decision-intensive, and real-world task scenarios.
## Online Demo
Welcome to try out our online demo [here](https://dr.miromind.ai/).
## Performance
> [!IMPORTANT]
> <div>
> To prevent data leakage during searches, we block Hugging Face domains to ensure the model doesn't access answers through shortcuts.
> </div>
### GAIA Benchmark
| **Method** | Text-103<br>Best Pass@1 | Text-103<br>Pass@1 (Avg@8) | Val-165<br>Best Pass@1 | Val-165<br>Pass@1 (Avg@8) |
| ----------------------------------------------------------------- | :--: | :--: | :--: | :--: |
| Search-o1-7B | 17.5 | - | - | - |
| R1-Searcher-7B | 20.4 | - | - | - |
| WebDancer-7B | 31.0 | - | - | - |
| WebSailor-7B | 37.9 | - | - | - |
| CK-Pro-8B | 43.7 | - | 35.2 | - |
| MiroThinker-8B-SFT-v0.1 | 44.7 | 40.1 | 34.6 | 31.8 |
| + Commercial Tools | 46.6 | 42.1 | 37.6 | 33.9 |
| MiroThinker-8B-DPO-v0.1 | 46.6 | 44.8 | 37.0 | 35.4 |
| + Commercial Tools | 50.5 | 46.7 | 38.2 | 35.9 |
| | | | | |
| Search-o1-32B | 28.2 | - | - | - |
| WebThinker-32B-RL | 48.5 | - | - | - |
| WebDancer-QwQ-32B | 51.5 | - | - | - |
| WebSailor-32B | 53.2 | - | - | - |
| WebShaper-QwQ-32B | 53.3 | - | - | - |
| WebShaper-72B | 60.1 | - | - | - |
| MiroThinker-14B-SFT-v0.1 | 47.6 | 44.4 | 37.0 | 34.4 |
| + Commercial Tools | 49.5 | 47.5 | 41.8 | 39.8 |
| MiroThinker-14B-DPO-v0.1 | 48.5 | 46.6 | 42.4 | 39.2 |
| + Commercial Tools | 52.4 | 48.5 | 45.5 | 42.0 |
| MiroThinker-32B-SFT-v0.1 | 55.3 | 51.3 | 44.9 | 42.7 |
| + Commercial Tools | 58.3 | 54.2 | 48.5 | 45.8 |
| <span style="white-space:nowrap;">MiroThinker-32B-DPO-v0.1</span> | 57.3 | 54.1 | 48.5 | 45.9 |
| + Commercial Tools | **60.2** | **57.9** | **50.9** | **48.9** |
1. Following the practices of WebThinker, WebAgents, and CognitiveKernel, we report the Best Pass@1, the highest score across three runs, which often reflects stronger performance, though it may exhibit some variability. To provide a more stable measure, we additionally report Pass@1 (Avg@8), which offers greater consistency at the cost of slightly lower scores.
2. For consistency with prior open-source works, we evaluate GAIA-Text-103 using the WebAgents LLM-as-judge template, and report results on GAIA-Val-165 using the official GAIA scorer script.
3. By default, we use open-source tools wherever possible, except for the code tool [E2B](https://github.com/e2b-dev/E2B) and the Google search tool [Serper](https://serper.dev/). We use [Whisper](https://huggingface.co/openai/whisper-large-v3-turbo), [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct), and [Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507) in our implementation. The framework can be easily extended to other open-source tools of your choice.
4. Commercial tools were mainly used for multimodal capabilities and certain complex reasoning subtasks. The majority of tasks, including planning, browsing, refinement, navigation, and more, were handled by our models.
### More Benchmarks
| Method | HLE<br>Pass@1 | Frames<br>Pass@1 | BrowseComp<br>Pass@1 | <span style="white-space:nowrap;">BrowseComp-ZH</span><br>Pass@1 | WebWalkerQA<br>Pass@1 |
|-------------------------------------------------------------------|:-------------:|:----------------:|:--------------------:|:----------------------------------------------------------------:|:---------------------:|
| OpenAI Deep Research | 26.6 | - | 51.5 | 42.9 | - |
| Gemini Deep Research | 26.9 | - | - | - | - |
| Kimi-Researcher | 26.9 | 78.8 | - | - | - |
| | | | | | |
| WebDancer-7B | - | - | - | - | 36.0 |
| WebSailor-7B | - | - | 6.7 | 14.2 | - |
| MiroThinker-8B-SFT-v0.1 | - | 58.0 | 5.5 | 9.3 | 41.3 |
| MiroThinker-8B-DPO-v0.1 | - | 64.4 | 8.7 | 13.5 | 45.7 |
| | | | | | |
| WebThinker-32B-RL | - | - | - | - | 46.5 |
| WebDancer-QwQ-32B | - | - | 3.8 | 18.0 | 47.9 |
| WebSailor-32B | - | - | 10.5 | 25.5 | - |
| WebShaper-32B | - | - | - | - | 51.4 |
| MiroThinker-32B-SFT-v0.1 | 10.2 | 70.4 | 10.6 | 13.8 | 45.7 |
| <span style="white-space:nowrap;">MiroThinker-32B-DPO-v0.1</span> | 11.8 | 71.7 | 13.0 | 17.0 | 49.3 |
1. MiroThinker’s performance was tested with [this repository](https://github.com/MiroMindAI/MiroThinker) and open-source tools; other models’ results are from their papers and official sites.
2. As [MiroVerse-v0.1](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1) mainly contains English data, the model’s Chinese capability is limited. We plan to add more Chinese data in the next version.
## Quick Start
MiroThinker-v0.1 is trained on our large-scale, high-quality trajectory and preference datasets [MiroVerse-v0.1](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1), utilizing the efficient training framework [MiroTrain](https://github.com/MiroMindAI/MiroTrain), and enhanced with tool-use capabilities through our agentic framework [MiroFlow](https://github.com/MiroMindAI/MiroFlow).
To promote reproducibility and benefit the community, we decided to open-source the entire suite mentioned above. For more technical details, evaluation results, and usage tutorials, please visit our [GitHub repository](https://github.com/MiroMindAI/MiroThinker).
## License
MiroThinker-v0.1 is licensed under Apache 2.0.
## Contact Us
MiroThinker is developed by the MiroMind Foundation Model Team.
If you would like to leave us a message, feel free to get in touch.
In addition to [GitHub](https://github.com/MiroMindAI/),
[Discord](https://discord.com/invite/GPqEnkzQZd),
[WeChat](https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/wechat.png),
and [RedNote](https://www.xiaohongshu.com/user/profile/5e353bd80000000001000239),
you can also reach us via email at [email protected]. |