Bohan22 nielsr HF Staff commited on
Commit
335a42c
·
verified ·
1 Parent(s): 2c592bf

Improve model card with metadata and additional usage information (#2)

Browse files

- Improve model card with metadata and additional usage information (9e6b3fc81dda74449f1042525ab289f163a9bb80)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +75 -75
README.md CHANGED
@@ -1,90 +1,34 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen3-8B
 
 
 
5
  ---
 
6
  <div align="center">
7
  <h1> <a href="http://blog.goedel-prover.com"> <strong>Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date</strong></a></h1>
8
  </div>
9
 
10
- <head>
11
- <meta charset="UTF-8">
12
- <style>
13
- .badges {
14
- display: flex;
15
- justify-content: center;
16
- gap: 0.5rem; /* eliminate whitespace between badges */
17
- padding: 0.8rem;
18
- }
19
- .badge {
20
- display: inline-flex; /* inline flex for pill layout */
21
- align-items: center; /* center icon & text vertically */
22
- padding: 0.4rem 0.4rem; /* larger badge size */
23
- background: #333;
24
- color: #fff; /* white text */
25
- text-decoration: none;
26
- font-family: sans-serif;
27
- font-size: 0.8rem; /* larger font */
28
- border-radius: 999em; /* full pill shape */
29
- transition: background 0.15s;/* smooth hover */
30
- }
31
- .badge:hover {
32
- background: #444;
33
- }
34
- .badge svg,
35
- .badge .emoji {
36
- width: 1.7em; /* scale icons with badge text */
37
- height: 1.7em;
38
- margin-right: 0.5em;
39
- fill: currentColor; /* inherit badge color */
40
- }
41
- </style>
42
- </head>
43
- <body>
44
- <div class="badges">
45
- <a href="https://arxiv.org/abs/2508.03613"
46
- class="badge"
47
- target="_blank"
48
- rel="noopener">
49
- <!-- use an emoji or inline-SVG for the PDF icon -->
50
- <span class="emoji"><h1>📄</h1></span>
51
- arXiv
52
- </a>
53
- <a href="http://blog.goedel-prover.com"
54
- class="badge" target="_blank" rel="noopener">
55
- <span class="emoji"><h1>🌐</h1></span>Website</a><a href="https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B"
56
- class="badge" target="_blank" rel="noopener">
57
- <span class="emoji"><h1>🤗</h1></span>HuggingFace</a>
58
- <a href="https://github.com/Goedel-LM/Goedel-Prover-V2" class="badge" target="_blank" rel="noopener">
59
- <svg viewBox="0 0 64 64" xmlns="http://www.w3.org/2000/svg">
60
- <path d="M32.029,8.345c-13.27,0-24.029,10.759-24.029,24.033c0,10.617 6.885,19.624
61
- 16.435,22.803c1.202,0.22 1.64-0.522 1.64-1.16c0-0.569-0.02-2.081-0.032-4.086
62
- c-6.685,1.452-8.095-3.222-8.095-3.222c-1.093-2.775-2.669-3.514-2.669-3.514
63
- c-2.182-1.492,0.165-1.462,0.165-1.462c2.412,0.171 3.681,2.477 3.681,2.477
64
- c2.144,3.672 5.625,2.611 6.994,1.997c0.219-1.553 0.838-2.612 1.526-3.213
65
- c-5.336-0.606-10.947-2.669-10.947-11.877c0-2.623 0.937-4.769 2.474-6.449
66
- c-0.247-0.608-1.072-3.051 0.235-6.36c0,0 2.018-0.646 6.609,2.464c1.917-0.533
67
- 3.973-0.8 6.016-0.809c2.041,0.009 4.097,0.276 6.017,0.809c4.588-3.11
68
- 6.602-2.464 6.602-2.464c1.311,3.309 0.486,5.752 0.239,6.36c1.54,1.68
69
- 2.471,3.826 2.471,6.449c0,9.232-5.62,11.263-10.974,11.858c0.864,0.742
70
- 1.632,2.208 1.632,4.451c0,3.212-0.029,5.804-0.029,6.591c0,0.644
71
- 0.432,1.392 1.652,1.157c9.542-3.185 16.421-12.186 16.421-22.8c0-13.274
72
- -10.76-24.033-24.034-24.033"/>
73
- </svg>
74
- Code
75
- </a>
76
- </div>
77
- </body>
78
 
79
  ## 1. Introduction
80
 
81
- We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) <strong>Scaffolded data synthesis</strong>: We generate synthetic proof tasks of increasing difficulty to progressively train the model, enabling it to master increasingly complex theorems; (2) <strong>Verifier-guided self-correction</strong>: The model learns to iteratively revise its own proofs by leveraging feedback from Lean’s compiler, closely mimicking how humans refine their work; (3) <strong>Model averaging</strong>: We combine multiple model checkpoints to improve robustness and overall performance.
82
 
83
- Our small model, Goedel-Prover-V2-8B, reaches 83.0% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size. Our flagship model, Goedel-Prover-V2-32B, achieves 88.0% on MiniF2F at Pass@32 on standard mode and 90.4% on self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and concurrent work Kimina-Prover-72B by a large margin. Additionaly, our flagship model with self-correction solves 64 problems on PutnamBench at Pass@64, securing the 1st on the leaderboard surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by Pass@1024.
84
 
85
  ## 2. Benchmark Performance
86
 
87
- <strong>Self-correction mode</strong>: Our model improves proof quality by first generating an initial candidate and then using Lean compiler feedback to iteratively revise it. We perform two rounds of self-correction, which remain computationally efficient—the total output length (including the initial proof and two revisions) increases only modestly from the standard 32K to 40K tokens.
88
 
89
 
90
  <style>
@@ -231,11 +175,63 @@ We release our Goedel-Prover-V2 models and the new MathOlympiadBench benchmark t
231
 
232
  </div>
233
 
234
- <strong>MathOlympiadBench</strong> (Math Olympiad Bench) comprises human-verified formalizations of Olympiad-level mathematical competition problems, sourced from [Compfiles](https://dwrensha.github.io/compfiles/imo.html) and [IMOSLLean4 repository](https://github.com/mortarsanjaya/IMOSLLean4). MathOlympiadBench contains 360 problems, including 158 IMO problems from 1959 to 2024, 131 IMO shortlist problems covering 2006 to 2023, 68 regional mathematical Olympiad problems, and 3 additional mathematical puzzles.
235
 
236
  This model is being released to aid other open-source projects, including those geared towards the upcoming IMO competition. A full paper with all details will be released in the coming weeks.
237
 
238
- ## 5. Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
239
  You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
240
 
241
  ```python
@@ -284,9 +280,13 @@ print(tokenizer.batch_decode(outputs))
284
  print(time.time() - start)
285
  ```
286
 
 
287
 
 
 
 
288
 
289
- ### Cite
290
  ```bibtex
291
  @article{lin2025goedelproverv2,
292
  title={Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction},
 
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen3-8B
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
  ---
8
+
9
  <div align="center">
10
  <h1> <a href="http://blog.goedel-prover.com"> <strong>Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date</strong></a></h1>
11
  </div>
12
 
13
+ <div align="center">
14
+
15
+ [![Website](https://img.shields.io/badge/%F0%9F%A4%96%20Homepage-Goedel-536af5?color=536af5&logoColor=white)](http://blog.goedel-prover.com)
16
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-black.svg?logo=github)](https://github.com/Goedel-LM/Goedel-Prover-V2)
17
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20face-Goedel-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/Goedel-LM/Goedel-Prover-V2-32B)
18
+ [![arXiv](https://img.shields.io/badge/arXiv-2508.03613-b31b1b.svg?style=flat)](https://arxiv.org/abs/2508.03613)
19
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
20
+
21
+ </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ## 1. Introduction
24
 
25
+ We introduce Goedel-Prover-V2, an open-source language model series that sets a new state-of-the-art in automated formal proof generation. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) **Scaffolded data synthesis**: We generate synthetic proof tasks of increasing difficulty to progressively train the model, enabling it to master increasingly complex theorems; (2) **Verifier-guided self-correction**: The model learns to iteratively revise its own proofs by leveraging feedback from Lean’s compiler, closely mimicking how humans refine their work; (3) **Model averaging**: We combine multiple model checkpoints to improve robustness and overall performance.
26
 
27
+ Our small model, Goedel-Prover-V2-8B, reaches 83.0% on MiniF2F test set at Pass@32, matching the performance of prior state-of-the-art DeepSeek-Prover-V2-671B while being nearly 100 times smaller in model size. Our flagship model, Goedel-Prover-V2-32B, achieves 88.0% on MiniF2F at Pass@32 on standard mode and 90.4% on self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and concurrent work Kimina-Prover-72B by a large margin. Additionaly, our flagship model with self-correction solves 64 problems on PutnamBench at Pass@64, securing the 1st on the leaderboard surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by Pass@1024.
28
 
29
  ## 2. Benchmark Performance
30
 
31
+ **Self-correction mode**: Our model improves proof quality by first generating an initial candidate and then using Lean compiler feedback to iteratively revise it. We perform two rounds of self-correction, which remain computationally efficient—the total output length (including the initial proof and two revisions) increases only modestly from the standard 32K to 40K tokens.
32
 
33
 
34
  <style>
 
175
 
176
  </div>
177
 
178
+ **MathOlympiadBench** (Math Olympiad Bench) comprises human-verified formalizations of Olympiad-level mathematical competition problems, sourced from [Compfiles](https://dwrensha.github.io/compfiles/imo.html) and [IMOSLLean4 repository](https://github.com/mortarsanjaya/IMOSLLean4). MathOlympiadBench contains 360 problems, including 158 IMO problems from 1959 to 2024, 131 IMO shortlist problems covering 2006 to 2023, 68 regional mathematical Olympiad problems, and 3 additional mathematical puzzles.
179
 
180
  This model is being released to aid other open-source projects, including those geared towards the upcoming IMO competition. A full paper with all details will be released in the coming weeks.
181
 
182
+ ## 5. Environment Setup
183
+
184
+ We follow [DeepSeek-Prover-V1.5](https://github.com/deepseek-ai/DeepSeek-Prover-V1.5), which uses Lean 4 version 4.9 and the corresponding Mathlib. Please refer to the following instructions to set up the environments.
185
+
186
+ ### Requirements
187
+
188
+ * Supported platform: Linux
189
+ * Python 3.10
190
+
191
+ ### Installation
192
+
193
+ 1. **Install Lean 4**
194
+
195
+ Follow the instructions on the [Lean 4 installation page](https://leanprover.github.io/lean4/doc/quickstart.html) to set up Lean 4.
196
+
197
+ 2. **Clone the repository**
198
+
199
+ ```sh
200
+ git clone --recurse-submodules https://github.com/Goedel-LM/Goedel-Prover-V2.git
201
+ cd Goedel-Prover-V2
202
+ ```
203
+
204
+ 3. **Install required packages**
205
+ ```sh
206
+ conda env create -f goedelv2.yml
207
+ ```
208
+
209
+ If you encounter installation error when installing packages (flash-attn), please run the following:
210
+
211
+ ```sh
212
+ conda activate goedelv2
213
+ pip install torch==2.6.0
214
+ conda env update --file goedelv2.yml
215
+ ```
216
+
217
+ 4. **Build Mathlib4**
218
+
219
+ ```sh
220
+ cd mathlib4
221
+ lake build
222
+ ```
223
+
224
+ 5. **Test Lean 4 and mathlib4 installation**
225
+
226
+ ```sh
227
+ cd ..
228
+ python lean_compiler/repl_scheduler.py
229
+ ```
230
+ If there is any error, reinstall Lean 4 and rebuild mathlib4.
231
+
232
+ If you have installed Lean and Mathlib for other projects and want to use the pre-installed things, note that you might need to modify `DEFAULT_LAKE_PATH` and `DEFAULT_LEAN_WORKSPACE` in `lean_compiler/repl_scheduler.py`.
233
+
234
+ ## 6. Quick Start
235
  You can directly use [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.
236
 
237
  ```python
 
280
  print(time.time() - start)
281
  ```
282
 
283
+ ## 7. Batch Inference and Self-correction
284
 
285
+ ```sh
286
+ bash scripts/pipeline.sh
287
+ ```
288
 
289
+ ### 8. Citation
290
  ```bibtex
291
  @article{lin2025goedelproverv2,
292
  title={Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction},