Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,220 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- small-language-model
|
5 |
+
- jee
|
6 |
+
- exam-centric
|
7 |
+
- indian-education
|
8 |
+
- reinforcement-learning
|
9 |
+
- supervised-finetuning
|
10 |
+
- model-merging
|
11 |
+
- rejection-sampling
|
12 |
+
- mathematics
|
13 |
+
- ai4education
|
14 |
+
- physicswallah
|
15 |
+
language:
|
16 |
+
- en
|
17 |
+
model_name: PhysicsWallah/Aryabhatta-1.0
|
18 |
+
model_creator: Physics Wallah AI Research
|
19 |
+
model_type: Causal decoder-based model
|
20 |
+
base_model: Qwen/Qwen2.5-Math-7B
|
21 |
+
pipeline_tag: text-generation
|
22 |
+
---
|
23 |
+
|
24 |
+
# Aryabhatta 1.0 🌟
|
25 |
+
|
26 |
+
**Aryabhatta 1.0** is a 7B parameter small language model for mathematics developed by **Physics Wallah AI Research**, optimized for high-stakes Indian competitive exams like **JEE Mains**. Despite its compact size, Aryabhatta 1.0 achieves **state-of-the-art performance** on exam-centric reasoning tasks with impressive **token efficiency** and low inference cost.
|
27 |
+
|
28 |
+
|
29 |
+
> 🚧 *Aryabhatta 1.0 is an **experimental release**. We are actively seeking feedback — please contribute in the Discussion tab of this repo.*
|
30 |
+
---
|
31 |
+
|
32 |
+
## 🧠 Key Features
|
33 |
+
|
34 |
+
- **Architecture**: 7B parameter causal decoder-based model.
|
35 |
+
- **Exam-Centric Optimization**: Specifically tuned for JEE-level Mathematics reasoning.
|
36 |
+
- **High Accuracy**:
|
37 |
+
- **86%** on **JEE Mains January 2025** session.
|
38 |
+
- **90.2%** on **JEE Mains April 2025** session.
|
39 |
+
- **Token Efficiency**: Operates effectively around a **~2K token window**, compared to ~8K required by other reasoning models.
|
40 |
+
- **Compute Efficient**: Trained on a **1x2 NVIDIA H100 GPU** using optimized pipeline.
|
41 |
+
|
42 |
+
---
|
43 |
+
|
44 |
+
## 🛠️ Training Details
|
45 |
+
|
46 |
+
- **Training Data**: ~130K problem-solution pairs curated from proprietary Physics Wallah exam datasets.
|
47 |
+
- **Training Pipeline**:
|
48 |
+
- **Model Merging**
|
49 |
+
- **Rejection Sampling**
|
50 |
+
- **Supervised Fine-Tuning (SFT)**
|
51 |
+
- **Reinforcement Learning with Verifiable Rewards (RLVR)**
|
52 |
+
|
53 |
+
### 🔀 Model Merging
|
54 |
+
We began with model merging (Weighted average) to build a strong initialization (Aryabhatta 0.5) by combining diverse model capabilities:
|
55 |
+
* Qwen 2.5 Math: A robust math-centric LLM with solid symbolic math foundations.
|
56 |
+
* Ace Math: An enhanced version of Qwen 2.5 Math, fine-tuned by NVIDIA for improved accuracy in mathematics benchmarks.
|
57 |
+
* DeepSeek R1 Distill Qwen: A long-form reasoning model, fine-tuned on reasoning traces distilled from DeepSeek R1.
|
58 |
+
|
59 |
+
### 📚 Data Curation + Rejection Sampling
|
60 |
+
We extracted ~250K raw questions from Physics Wallah's internal database and applied aggressive filtering and cleaning:
|
61 |
+
* Removed: diagram-based, non-English, and option-heavy questions.
|
62 |
+
* Kept: questions matching the distribution of JEE Main 2019–2024.
|
63 |
+
Final curated dataset: ~130K high-quality questions.
|
64 |
+
|
65 |
+
For each question:
|
66 |
+
* Generated 4 CoTs using Aryabhatta 0.5.
|
67 |
+
* Retained only those leading to correct final answers.
|
68 |
+
|
69 |
+
Resulting Dataset:
|
70 |
+
* ~100K questions
|
71 |
+
* ~350K high-quality CoTs
|
72 |
+
|
73 |
+
We used this dataset for SFT.
|
74 |
+
|
75 |
+
### 🎯 Reinforcement Learning with Verifiable Rewards (RLVR)
|
76 |
+
We used a custom in-house variant of Group Relative Policy Optimization (GRPO), adapted for math-specific reward functions.
|
77 |
+
* Removed KL-divergence penalty
|
78 |
+
* Removed clipping
|
79 |
+
|
80 |
+
We used RLVR on the remaining ~30K questions.
|
81 |
+
|
82 |
+
This multi-phase training strategy allows Aryabhatta 1.0 to capture **pedagogy-aligned reasoning patterns**, making it highly effective for solving real student queries in mathematics.
|
83 |
+
|
84 |
+
---
|
85 |
+
|
86 |
+
## 📊 Performance Highlights
|
87 |
+
|
88 |
+
### Evaluation Setup
|
89 |
+
All evaluations were performed with temperature = 0.0, and we report pass@1 accuracy.
|
90 |
+
|
91 |
+
#### Evaluation Datasets
|
92 |
+
We evaluated the model on two sets of official JEE Mains 2025 mathematics papers:
|
93 |
+
* January Session: 10 question papers containing 250 questions.
|
94 |
+
* April Session: 9 question papers containing 225 questions.
|
95 |
+
|
96 |
+
Each paper includes a mix of:
|
97 |
+
* Multiple Choice Questions (MCQs) with one correct option
|
98 |
+
* Numeric Answer Type (NAT) questions requiring precise numerical responses
|
99 |
+
|
100 |
+
#### Evaluation Metric
|
101 |
+
We used a composite evaluation metric to reflect real-world grading rigor and reduce false positives:
|
102 |
+
|
103 |
+
1. Float Match
|
104 |
+
* Compares predicted and target answers within a tolerance (±1e-9)
|
105 |
+
* Handles rounding artifacts and small numerical errors robustly
|
106 |
+
2. String Match
|
107 |
+
* Used for symbolic answers (e.g., fractions, radicals)
|
108 |
+
* Uses strict exact match — predictions must match ground truth character-for-character
|
109 |
+
3. LLM-as-Judge (GPT-4o-mini)
|
110 |
+
* Used for Mathematical equivalence for ambiguous formats
|
111 |
+
|
112 |
+
### 🔹 Accuracy Comparison Across Models
|
113 |
+

|
114 |
+
> *Aryabhatta has the best accuracy on JEE Main Maths, on par with frontier models*
|
115 |
+
|
116 |
+
### 🔹 Accuracy vs Token Usage
|
117 |
+

|
118 |
+
> *Aryabhatta is on par with frontier models in terms of accuracy vs token usage*
|
119 |
+
|
120 |
+
---
|
121 |
+
|
122 |
+
## 🔧 Intended Use
|
123 |
+
|
124 |
+
**Primary Use Cases**:
|
125 |
+
- Competitive exam preparation (JEE Main level mathematics problems)
|
126 |
+
- Question answering and doubt-solving systems
|
127 |
+
- Educational tutoring and concept explanation
|
128 |
+
|
129 |
+
|
130 |
+
## 💡 How to Use
|
131 |
+
|
132 |
+
### 🧪 Using with 🤗 Transformers
|
133 |
+
|
134 |
+
```python
|
135 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
|
136 |
+
|
137 |
+
model_id = "PhysicsWallahAI/Aryabhatta-1.0"
|
138 |
+
|
139 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
140 |
+
model = AutoModelForCausalLM.from_pretrained(model_id)
|
141 |
+
|
142 |
+
|
143 |
+
# Define stop strings
|
144 |
+
stop_strings = ["<|im_end|>", "<|end|>", "<im_start|>", "```python\n", "<|im_start|>", "]}}]}}]"]
|
145 |
+
|
146 |
+
def strip_bad_tokens(s, stop_strings):
|
147 |
+
for suffix in stop_strings:
|
148 |
+
if s.endswith(suffix):
|
149 |
+
return s[:-len(suffix)]
|
150 |
+
return s
|
151 |
+
|
152 |
+
|
153 |
+
# Create generation config (can also set temperature, top_p, etc.)
|
154 |
+
generation_config = GenerationConfig(
|
155 |
+
max_new_tokens=4096,
|
156 |
+
stop_strings = stop_strings
|
157 |
+
)
|
158 |
+
|
159 |
+
query = 'Find all the values of \\sqrt[3]{1}'
|
160 |
+
messages = [{'role': 'system', 'content': 'Think step-by-step; put only the final answer inside \\boxed{}.'},
|
161 |
+
{'role': 'user', 'content': query}]
|
162 |
+
|
163 |
+
text = tokenizer.apply_chat_template(
|
164 |
+
messages,
|
165 |
+
tokenize=False,
|
166 |
+
add_generation_prompt=True
|
167 |
+
)
|
168 |
+
inputs = tokenizer([text], return_tensors="pt")
|
169 |
+
outputs = model.generate(**inputs, generation_config=generation_config, tokenizer=tokenizer)
|
170 |
+
|
171 |
+
print(strip_bad_tokens(tokenizer.decode(outputs[0], skip_special_tokens=True), stop_strings))
|
172 |
+
````
|
173 |
+
|
174 |
+
---
|
175 |
+
|
176 |
+
### ⚡ Using with vLLM
|
177 |
+
|
178 |
+
To run the model efficiently using vLLM:
|
179 |
+
|
180 |
+
```python
|
181 |
+
from vllm import LLM, SamplingParams
|
182 |
+
|
183 |
+
# Initialize model (downloads from Hugging Face if not local)
|
184 |
+
llm = LLM(model="PhysicsWallahAI/Aryabhatta-1.0")
|
185 |
+
|
186 |
+
# Define prompt and sampling configuration
|
187 |
+
query = 'Find all the values of \\sqrt[3]{1}'
|
188 |
+
messages = [{'role': 'system', 'content': 'Think step-by-step; put only the final answer inside \\boxed{}.'},
|
189 |
+
{'role': 'user', 'content': query}]
|
190 |
+
sampling_params = SamplingParams(temperature=0.0, max_tokens=4*1024, stop=["<|im_end|>", "<|end|>", "<im_start|>", "```python\n", "<|im_start|>", "]}}]}}]"])
|
191 |
+
|
192 |
+
# Run inference
|
193 |
+
results = llm.chat(messages, sampling_params)
|
194 |
+
|
195 |
+
# Print result
|
196 |
+
print(results[0].outputs[0].text.strip())
|
197 |
+
```
|
198 |
+
|
199 |
+
---
|
200 |
+
|
201 |
+
## 🚀 Roadmap
|
202 |
+
|
203 |
+
**Aryabhatta 2.0** (Upcoming):
|
204 |
+
- Extending domain coverage to **Physics** and **Chemistry**
|
205 |
+
- Supporting **JEE Advanced**, **NEET**, and **Foundation syllabus**
|
206 |
+
- Further optimization for affordability and accuracy in real-time deployments
|
207 |
+
|
208 |
+
---
|
209 |
+
|
210 |
+
## 🤝 Citation
|
211 |
+
|
212 |
+
If you use this model, please cite:
|
213 |
+
|
214 |
+
```bibtex
|
215 |
+
@misc{aryabhatta2025,
|
216 |
+
title = {Aryabhatta 1.0: A compact, exam-focused language model tailored for mathematics in Indian competitive exams, especially JEE Main.},
|
217 |
+
author = {Physics Wallah AI Research},
|
218 |
+
year = {2025},
|
219 |
+
note = {\url{https://huggingface.co/PhysicsWallahAI/Aryabhatta-1.0}},
|
220 |
+
}
|