JetBrains-Research
/

rocq-language-theorem-embeddings

Model card Files Files and versions

kdizzled commited on 6 days ago

Commit

eb56d3b

·

verified ·

1 Parent(s): 23c2adf

Update README.md

Files changed (1) hide show

README.md +1 -12

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBra
 ### Training Procedure
 * **Objective:** InfoNCE
-* **Batch size:** 32
 * **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
 * **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
@@ -52,17 +52,6 @@ Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBra
 * **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
 * **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
-### Results
-| Model (back‑end) | Bucket | Baseline Jaccard | **RocqStar** |
-| ---------------- | ------ | ---------------- | ------------ |
-| GPT‑4o           | ≤ 4    | 48 ± 5 %         | **51 ± 5 %** |
-| GPT‑4o           | 5–8    | 18 ± 4 %         | **25 ± 3 %** |
-| GPT‑4o           | 9–20   | 11 ± 4 %         | 11 ± 5 %     |
-| Claude 3.5       | ≤ 4    | 58 ± 5 %         | **61 ± 4 %** |
-| Claude 3.5       | 5–8    | 28 ± 5 %         | **36 ± 5 %** |
-| Claude 3.5       | 9–20   | 16 ± 5 %         | **21 ± 5 %** |
 #### Summary
 RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.

 ### Training Procedure
 * **Objective:** InfoNCE
+* **Batch size:** 16
 * **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
 * **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
 * **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
 * **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
 #### Summary
 RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.