PursuitOfDataScience commited on
Commit
72d89a9
·
verified ·
1 Parent(s): 448330c

Update README.md

Browse files

added evaluation metric.

Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -26,6 +26,26 @@ This model is a LoRA (Low-Rank Adaptation) fine-tuned version of **Qwen2.5-1.5B-
26
 
27
  ---
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## How to Use
30
 
31
  ### Example Python Script
 
26
 
27
  ---
28
 
29
+ ## Evaluation on MATH-500 Benchmark
30
+
31
+ After following the sampling-based Pass@1 methodology inspired by [DeepSeek R1](https://arxiv.org/abs/2501.12948), we have
32
+
33
+
34
+ | Parameter | Value |
35
+ |------------------|---------|
36
+ | **Dataset** | `uggingFaceH4/MATH-500` |
37
+ | **Temperature** | `0.6` |
38
+ | **Top_p** | `0.95` |
39
+ | **Num_samples** | `16` per question |
40
+
41
+ ### Results
42
+
43
+ - **At-least-one-correct Rate:** **54.60%** (273 out of 500 questions)
44
+
45
+ *This metric represents the percentage of questions with at least one correct solution among multiple generated attempts.*
46
+
47
+ ---
48
+
49
  ## How to Use
50
 
51
  ### Example Python Script