File size: 7,780 Bytes
42c14ee 5f523dc 42c14ee 33cfd48 42c14ee 33cfd48 1593799 42c14ee 19b901a 42c14ee bbd731e 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee ffa583f 42c14ee 1593799 42c14ee 1593799 42c14ee 1593799 42c14ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
---
license: cc-by-nc-4.0
tags:
- small-language-model
- jee
- exam-centric
- indian-education
- reinforcement-learning
- supervised-finetuning
- model-merging
- rejection-sampling
- mathematics
- ai4education
- physicswallah
language:
- en
model_name: PhysicsWallah/Aryabhata-1.0
model_creator: Physics Wallah AI Research
model_type: Causal decoder-based model
base_model: Qwen/Qwen2.5-Math-7B
pipeline_tag: text-generation
library_name: transformers
---
# Aryabhatta 1.0 : An exam-focused language model for JEE Math

## Overview
**Aryabhata 1.0** is a 7B parameter small language model for mathematics developed by **Physics Wallah AI Research**, optimized for high-stakes Indian competitive exams like **JEE Mains**. Despite its compact size, Aryabhata 1.0 achieves **state-of-the-art performance** on exam-centric reasoning tasks with impressive **token efficiency** and low inference cost.
> 🚧 *Aryabhata 1.0 is an **experimental release**. We are actively seeking feedback — please contribute in the Discussion tab of this repo.*
---
## 🧠 Key Features
- **Architecture**: 7B parameter causal decoder-based model.
- **Exam-Centric Optimization**: Specifically tuned for JEE-level Mathematics reasoning.
- **High Accuracy**:
- **86%** on **JEE Mains January 2025** session.
- **90.2%** on **JEE Mains April 2025** session.
- **Token Efficiency**: Operates effectively around a **~2K token window**, compared to ~8K required by other reasoning models.
- **Compute Efficient**: Trained on a **1x2 NVIDIA H100 GPU** using optimized pipeline.
---
## 🛠️ Training Details
- **Training Data**: ~130K problem-solution pairs curated from proprietary Physics Wallah exam datasets.
- **Training Pipeline**:
- **Model Merging**
- **Rejection Sampling**
- **Supervised Fine-Tuning (SFT)**
- **Reinforcement Learning with Verifiable Rewards (RLVR)**
### 🔀 Model Merging
We began with model merging (Weighted average) to build a strong initialization (Aryabhata 0.5) by combining diverse model capabilities:
* Qwen 2.5 Math: A robust math-centric LLM with solid symbolic math foundations.
* Ace Math: An enhanced version of Qwen 2.5 Math, fine-tuned by NVIDIA for improved accuracy in mathematics benchmarks.
* DeepSeek R1 Distill Qwen: A long-form reasoning model, fine-tuned on reasoning traces distilled from DeepSeek R1.
### 📚 Data Curation + Rejection Sampling
We extracted ~250K raw questions from Physics Wallah's internal database and applied aggressive filtering and cleaning:
* Removed: diagram-based, non-English, and option-heavy questions.
* Kept: questions matching the distribution of JEE Main 2019–2024.
Final curated dataset: ~130K high-quality questions.
For each question:
* Generated 4 CoTs using Aryabhata 0.5.
* Retained only those leading to correct final answers.
Resulting Dataset:
* ~100K questions
* ~350K high-quality CoTs
We used this dataset for SFT.
### 🎯 Reinforcement Learning with Verifiable Rewards (RLVR)
We used a custom in-house variant of Group Relative Policy Optimization (GRPO), adapted for math-specific reward functions.
* Removed KL-divergence penalty
* Removed clipping
We used RLVR on the remaining ~30K questions.
This multi-phase training strategy allows Aryabhata 1.0 to capture **pedagogy-aligned reasoning patterns**, making it highly effective for solving real student queries in mathematics.
---
## 📊 Performance Highlights
### Evaluation Setup
All evaluations were performed with temperature = 0.0, and we report pass@1 accuracy.
#### Evaluation Datasets
We evaluated the model on two sets of official JEE Mains 2025 mathematics papers:
* January Session: 10 question papers containing 250 questions.
* April Session: 9 question papers containing 225 questions.
Each paper includes a mix of:
* Multiple Choice Questions (MCQs) with one correct option
* Numeric Answer Type (NAT) questions requiring precise numerical responses
#### Evaluation Metric
We used a composite evaluation metric to reflect real-world grading rigor and reduce false positives:
1. Float Match
* Compares predicted and target answers within a tolerance (±1e-9)
* Handles rounding artifacts and small numerical errors robustly
2. String Match
* Used for symbolic answers (e.g., fractions, radicals)
* Uses strict exact match — predictions must match ground truth character-for-character
3. LLM-as-Judge (GPT-4o-mini)
* Used for Mathematical equivalence for ambiguous formats
### 🔹 Accuracy Comparison Across Models

> *Aryabhata has the best accuracy on JEE Main Maths, on par with frontier models*
### 🔹 Accuracy vs Token Usage

> *Aryabhata is on par with frontier models in terms of accuracy vs token usage*
---
## 🔧 Intended Use
**Primary Use Cases**:
- Competitive exam preparation (JEE Main level mathematics problems)
- Question answering and doubt-solving systems
- Educational tutoring and concept explanation
## 💡 How to Use
### 🧪 Using with 🤗 Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_id = "PhysicsWallahAI/Aryabhata-1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Define stop strings
stop_strings = ["<|im_end|>", "<|end|>", "<im_start|>", "```python\n", "<|im_start|>", "]}}]}}]"]
def strip_bad_tokens(s, stop_strings):
for suffix in stop_strings:
if s.endswith(suffix):
return s[:-len(suffix)]
return s
# Create generation config (can also set temperature, top_p, etc.)
generation_config = GenerationConfig(
max_new_tokens=4096,
stop_strings = stop_strings
)
query = 'Find all the values of \\sqrt[3]{1}'
messages = [{'role': 'system', 'content': 'Think step-by-step; put only the final answer inside \\boxed{}.'},
{'role': 'user', 'content': query}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt")
outputs = model.generate(**inputs, generation_config=generation_config, tokenizer=tokenizer)
print(strip_bad_tokens(tokenizer.decode(outputs[0], skip_special_tokens=True), stop_strings))
````
---
### ⚡ Using with vLLM
To run the model efficiently using vLLM:
```python
from vllm import LLM, SamplingParams
# Initialize model (downloads from Hugging Face if not local)
llm = LLM(model="PhysicsWallahAI/Aryabhata-1.0")
# Define prompt and sampling configuration
query = 'Find all the values of \\sqrt[3]{1}'
messages = [{'role': 'system', 'content': 'Think step-by-step; put only the final answer inside \\boxed{}.'},
{'role': 'user', 'content': query}]
sampling_params = SamplingParams(temperature=0.0, max_tokens=4*1024, stop=["<|im_end|>", "<|end|>", "<im_start|>", "```python\n", "<|im_start|>", "]}}]}}]"])
# Run inference
results = llm.chat(messages, sampling_params)
# Print result
print(results[0].outputs[0].text.strip())
```
---
Read more about Aryabhata 1.0 in our [Technical Report](https://arxiv.org/abs/2508.08665)
---
## 🚀 Roadmap
**Aryabhata 2.0** (Upcoming):
- Extending domain coverage to **Physics** and **Chemistry**
- Supporting **JEE Advanced**, **NEET**, and **Foundation syllabus**
- Further optimization for affordability and accuracy in real-time deployments
---
## 🤝 Citation
If you use this model, please cite:
```bibtex
@misc{Aryabhata2025,
title = {Aryabhata 1.0: A compact, exam-focused language model tailored for mathematics in Indian competitive exams, especially JEE Main.},
author = {Physics Wallah AI Research},
year = {2025},
note = {\url{https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0}},
} |