Improve model card with metadata and additional usage information (#2)
Browse files- Improve model card with metadata and additional usage information (9e6b3fc81dda74449f1042525ab289f163a9bb80)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,90 +1,34 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- Qwen/Qwen3-8B
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
|
|
|
| 6 |
<div align="center">
|
| 7 |
<h1> <a href="http://blog.goedel-prover.com"> <strong>Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date</strong></a></h1>
|
| 8 |
</div>
|
| 9 |
|
| 10 |
-
<
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
.badge {
|
| 20 |
-
display: inline-flex; /* inline flex for pill layout */
|
| 21 |
-
align-items: center; /* center icon & text vertically */
|
| 22 |
-
padding: 0.4rem 0.4rem; /* larger badge size */
|
| 23 |
-
background: #333;
|
| 24 |
-
color: #fff; /* white text */
|
| 25 |
-
text-decoration: none;
|
| 26 |
-
font-family: sans-serif;
|
| 27 |
-
font-size: 0.8rem; /* larger font */
|
| 28 |
-
border-radius: 999em; /* full pill shape */
|
| 29 |
-
transition: background 0.15s;/* smooth hover */
|
| 30 |
-
}
|
| 31 |
-
.badge:hover {
|
| 32 |
-
background: #444;
|
| 33 |
-
}
|
| 34 |
-
.badge svg,
|
| 35 |
-
.badge .emoji {
|
| 36 |
-
width: 1.7em; /* scale icons with badge text */
|
| 37 |
-
height: 1.7em;
|
| 38 |
-
margin-right: 0.5em;
|
| 39 |
-
fill: currentColor; /* inherit badge color */
|
| 40 |
-
}
|
| 41 |
-
</style>
|
| 42 |
-
</head>
|
| 43 |
-
<body>
|
| 44 |
-
<div class="badges">
|
| 45 |
-
<a href="https://arxiv.org/abs/2508.03613"
|
| 46 |
-
class="badge"
|
| 47 |
-
target="_blank"
|
| 48 |
-
rel="noopener">
|
| 49 |
-
<!-- use an emoji or inline-SVG for the PDF icon -->
|
| 50 |
-
<span class="emoji"><h1>📄</h1></span>
|
| 51 |
-
arXiv
|
| 52 |
-
</a>
|
| 53 |
-
<a href="http://blog.goedel-prover.com"
|
| 54 |
-
class="badge" target="_blank" rel="noopener">
|
| 55 |
-
<span class="emoji"><h1>🌐</h1></span>Website</a><a href="https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B"
|
| 56 |
-
class="badge" target="_blank" rel="noopener">
|
| 57 |
-
<span class="emoji"><h1>🤗</h1></span>HuggingFace</a>
|
| 58 |
-
<a href="https://github.com/Goedel-LM/Goedel-Prover-V2" class="badge" target="_blank" rel="noopener">
|
| 59 |
-
<svg viewBox="0 0 64 64" xmlns="http://www.w3.org/2000/svg">
|
| 60 |
-
<path d="M32.029,8.345c-13.27,0-24.029,10.759-24.029,24.033c0,10.617 6.885,19.624
|
| 61 |
-
16.435,22.803c1.202,0.22 1.64-0.522 1.64-1.16c0-0.569-0.02-2.081-0.032-4.086
|
| 62 |
-
c-6.685,1.452-8.095-3.222-8.095-3.222c-1.093-2.775-2.669-3.514-2.669-3.514
|
| 63 |
-
c-2.182-1.492,0.165-1.462,0.165-1.462c2.412,0.171 3.681,2.477 3.681,2.477
|
| 64 |
-
c2.144,3.672 5.625,2.611 6.994,1.997c0.219-1.553 0.838-2.612 1.526-3.213
|
| 65 |
-
c-5.336-0.606-10.947-2.669-10.947-11.877c0-2.623 0.937-4.769 2.474-6.449
|
| 66 |
-
c-0.247-0.608-1.072-3.051 0.235-6.36c0,0 2.018-0.646 6.609,2.464c1.917-0.533
|
| 67 |
-
3.973-0.8 6.016-0.809c2.041,0.009 4.097,0.276 6.017,0.809c4.588-3.11
|
| 68 |
-
6.602-2.464 6.602-2.464c1.311,3.309 0.486,5.752 0.239,6.36c1.54,1.68
|
| 69 |
-
2.471,3.826 2.471,6.449c0,9.232-5.62,11.263-10.974,11.858c0.864,0.742
|
| 70 |
-
1.632,2.208 1.632,4.451c0,3.212-0.029,5.804-0.029,6.591c0,0.644
|
| 71 |
-
0.432,1.392 1.652,1.157c9.542-3.185 16.421-12.186 16.421-22.8c0-13.274
|
| 72 |
-
-10.76-24.033-24.034-24.033"/>
|
| 73 |
-
</svg>
|
| 74 |
-
Code
|
| 75 |
-
</a>
|
| 76 |
-
</div>
|
| 77 |
-
</body>
|
| 78 |
|
| 79 |
## 1. Introduction
|
| 80 |
|
| 81 |
-
We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1)
|
| 82 |
|
| 83 |
-
Our small model, Goedel-Prover-V2-8B, reaches 83.0% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size.
|
| 84 |
|
| 85 |
## 2. Benchmark Performance
|
| 86 |
|
| 87 |
-
|
| 88 |
|
| 89 |
|
| 90 |
<style>
|
|
@@ -231,11 +175,63 @@ We release our Goedel-Prover-V2 models and the new MathOlympiadBench benchmark t
|
|
| 231 |
|
| 232 |
</div>
|
| 233 |
|
| 234 |
-
|
| 235 |
|
| 236 |
This model is being released to aid other open-source projects, including those geared towards the upcoming IMO competition. A full paper with all details will be released in the coming weeks.
|
| 237 |
|
| 238 |
-
## 5.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
|
| 240 |
|
| 241 |
```python
|
|
@@ -284,9 +280,13 @@ print(tokenizer.batch_decode(outputs))
|
|
| 284 |
print(time.time() - start)
|
| 285 |
```
|
| 286 |
|
|
|
|
| 287 |
|
|
|
|
|
|
|
|
|
|
| 288 |
|
| 289 |
-
###
|
| 290 |
```bibtex
|
| 291 |
@article{lin2025goedelproverv2,
|
| 292 |
title={Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction},
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen3-8B
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
library_name: transformers
|
| 6 |
+
pipeline_tag: text-generation
|
| 7 |
---
|
| 8 |
+
|
| 9 |
<div align="center">
|
| 10 |
<h1> <a href="http://blog.goedel-prover.com"> <strong>Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date</strong></a></h1>
|
| 11 |
</div>
|
| 12 |
|
| 13 |
+
<div align="center">
|
| 14 |
+
|
| 15 |
+
[](http://blog.goedel-prover.com)
|
| 16 |
+
[](https://github.com/Goedel-LM/Goedel-Prover-V2)
|
| 17 |
+
[](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B)
|
| 18 |
+
[](https://arxiv.org/abs/2508.03613)
|
| 19 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
| 20 |
+
|
| 21 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## 1. Introduction
|
| 24 |
|
| 25 |
+
We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) **Scaffolded data synthesis**: We generate synthetic proof tasks of increasing difficulty to progressively train the model, enabling it to master increasingly complex theorems; (2) **Verifier-guided self-correction**: The model learns to iteratively revise its own proofs by leveraging feedback from Lean’s compiler, closely mimicking how humans refine their work; (3) **Model averaging**: We combine multiple model checkpoints to improve robustness and overall performance.
|
| 26 |
|
| 27 |
+
Our small model, Goedel-Prover-V2-8B, reaches 83.0% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size. Our flagship model, Goedel-Prover-V2-32B, achieves 88.0% on MiniF2F at Pass@32 on standard mode and 90.4% on self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and concurrent work Kimina-Prover-72B by a large margin. Additionaly, our flagship model with self-correction solves 64 problems on PutnamBench at Pass@64, securing the 1st on the leaderboard surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by Pass@1024.
|
| 28 |
|
| 29 |
## 2. Benchmark Performance
|
| 30 |
|
| 31 |
+
**Self-correction mode**: Our model improves proof quality by first generating an initial candidate and then using Lean compiler feedback to iteratively revise it. We perform two rounds of self-correction, which remain computationally efficient—the total output length (including the initial proof and two revisions) increases only modestly from the standard 32K to 40K tokens.
|
| 32 |
|
| 33 |
|
| 34 |
<style>
|
|
|
|
| 175 |
|
| 176 |
</div>
|
| 177 |
|
| 178 |
+
**MathOlympiadBench** (Math Olympiad Bench) comprises human-verified formalizations of Olympiad-level mathematical competition problems, sourced from [Compfiles](https://dwrensha.github.io/compfiles/imo.html) and [IMOSLLean4 repository](https://github.com/mortarsanjaya/IMOSLLean4). MathOlympiadBench contains 360 problems, including 158 IMO problems from 1959 to 2024, 131 IMO shortlist problems covering 2006 to 2023, 68 regional mathematical Olympiad problems, and 3 additional mathematical puzzles.
|
| 179 |
|
| 180 |
This model is being released to aid other open-source projects, including those geared towards the upcoming IMO competition. A full paper with all details will be released in the coming weeks.
|
| 181 |
|
| 182 |
+
## 5. Environment Setup
|
| 183 |
+
|
| 184 |
+
We follow [DeepSeek-Prover-V1.5](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5), which uses Lean 4 version 4.9 and the corresponding Mathlib. Please refer to the following instructions to set up the environments.
|
| 185 |
+
|
| 186 |
+
### Requirements
|
| 187 |
+
|
| 188 |
+
* Supported platform: Linux
|
| 189 |
+
* Python 3.10
|
| 190 |
+
|
| 191 |
+
### Installation
|
| 192 |
+
|
| 193 |
+
1. **Install Lean 4**
|
| 194 |
+
|
| 195 |
+
Follow the instructions on the [Lean 4 installation page](https://leanprover.github.io/lean4/doc/quickstart.html) to set up Lean 4.
|
| 196 |
+
|
| 197 |
+
2. **Clone the repository**
|
| 198 |
+
|
| 199 |
+
```sh
|
| 200 |
+
git clone --recurse-submodules https://github.com/Goedel-LM/Goedel-Prover-V2.git
|
| 201 |
+
cd Goedel-Prover-V2
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
3. **Install required packages**
|
| 205 |
+
```sh
|
| 206 |
+
conda env create -f goedelv2.yml
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
If you encounter installation error when installing packages (flash-attn), please run the following:
|
| 210 |
+
|
| 211 |
+
```sh
|
| 212 |
+
conda activate goedelv2
|
| 213 |
+
pip install torch==2.6.0
|
| 214 |
+
conda env update --file goedelv2.yml
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
4. **Build Mathlib4**
|
| 218 |
+
|
| 219 |
+
```sh
|
| 220 |
+
cd mathlib4
|
| 221 |
+
lake build
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
5. **Test Lean 4 and mathlib4 installation**
|
| 225 |
+
|
| 226 |
+
```sh
|
| 227 |
+
cd ..
|
| 228 |
+
python lean_compiler/repl_scheduler.py
|
| 229 |
+
```
|
| 230 |
+
If there is any error, reinstall Lean 4 and rebuild mathlib4.
|
| 231 |
+
|
| 232 |
+
If you have installed Lean and Mathlib for other projects and want to use the pre-installed things, note that you might need to modify `DEFAULT_LAKE_PATH` and `DEFAULT_LEAN_WORKSPACE` in `lean_compiler/repl_scheduler.py`.
|
| 233 |
+
|
| 234 |
+
## 6. Quick Start
|
| 235 |
You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
|
| 236 |
|
| 237 |
```python
|
|
|
|
| 280 |
print(time.time() - start)
|
| 281 |
```
|
| 282 |
|
| 283 |
+
## 7. Batch Inference and Self-correction
|
| 284 |
|
| 285 |
+
```sh
|
| 286 |
+
bash scripts/pipeline.sh
|
| 287 |
+
```
|
| 288 |
|
| 289 |
+
### 8. Citation
|
| 290 |
```bibtex
|
| 291 |
@article{lin2025goedelproverv2,
|
| 292 |
title={Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction},
|