Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,7 @@ Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBra
|
|
41 |
### Training Procedure
|
42 |
|
43 |
* **Objective:** InfoNCE
|
44 |
-
* **Batch size:**
|
45 |
* **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
|
46 |
* **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
|
47 |
|
@@ -52,17 +52,6 @@ Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBra
|
|
52 |
* **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
|
53 |
* **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
|
54 |
|
55 |
-
### Results
|
56 |
-
|
57 |
-
| Model (back‑end) | Bucket | Baseline Jaccard | **RocqStar** |
|
58 |
-
| ---------------- | ------ | ---------------- | ------------ |
|
59 |
-
| GPT‑4o | ≤ 4 | 48 ± 5 % | **51 ± 5 %** |
|
60 |
-
| GPT‑4o | 5–8 | 18 ± 4 % | **25 ± 3 %** |
|
61 |
-
| GPT‑4o | 9–20 | 11 ± 4 % | 11 ± 5 % |
|
62 |
-
| Claude 3.5 | ≤ 4 | 58 ± 5 % | **61 ± 4 %** |
|
63 |
-
| Claude 3.5 | 5–8 | 28 ± 5 % | **36 ± 5 %** |
|
64 |
-
| Claude 3.5 | 9–20 | 16 ± 5 % | **21 ± 5 %** |
|
65 |
-
|
66 |
#### Summary
|
67 |
|
68 |
RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.
|
|
|
41 |
### Training Procedure
|
42 |
|
43 |
* **Objective:** InfoNCE
|
44 |
+
* **Batch size:** 16
|
45 |
* **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
|
46 |
* **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
|
47 |
|
|
|
52 |
* **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
|
53 |
* **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
|
54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
#### Summary
|
56 |
|
57 |
RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.
|