Update README.md

012150f verified 5 months ago

3.97 kB

	---
	library_name: transformers
	tags:
	- text-generation-inference
	- PRM
	- Code
	- Math
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-1.5B-Instruct
	pipeline_tag: text-generation
	---

	![PRM.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/2inJGKPx_BrMcID7Osto-.png)

	# Deepthink-1.5B-Open-PRM

	> Deepthink-1.5B-Open-PRM is a process-supervised reasoning model fine-tuned from Qwen2.5 1.5B using Process Reward Models (PRM). It excels at step-by-step mathematical problem solving in both English and Simplified Chinese, offering interpretable, logically structured responses for use in education, STEM tutoring, and lightweight math agents.

	## Key Features

	1. Process Reward Model Supervision (PRM)
	Fine-tuned with PRMs to reward high-quality intermediate reasoning steps — fostering step-by-step interpretability, accuracy, and educational transparency.

	2. Compact Foundation (Qwen2.5 0.5B)
	Built upon the highly efficient Qwen2.5 1.5B architecture and scaled up through distillation and reward-based alignment to 1.5B parameters, balancing reasoning quality and deployment efficiency.

	3. Bilingual Math Capability
	Fluent in solving and explaining math problems in both English and Simplified Chinese, making it ideal for multilingual classrooms and tutoring platforms.

	4. Process-Supervised Math Reasoning
	Trained to reason like a teacher — showing each logical step before delivering an answer. Ideal for learners who need to understand the “how” and “why” behind each solution.

	5. Long-Context & Word Problem Reasoning
	Especially proficient with multi-step arithmetic, word problems, logic puzzles, and middle school to early college-level math.

	## Quickstart with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "prithivMLmods/Deepthink-1.5B-Open-PRM"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Solve: A tank can be filled by one pipe in 6 hours and emptied by another in 9 hours. How long will it take to fill the tank if both pipes are opened together?"

	messages = [
	{"role": "system", "content": "You are a helpful math tutor who explains each step clearly."},
	{"role": "user", "content": prompt}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	## Intended Use

	- Math Education Agents: Tutors that explain problems step by step, helping users build understanding through reasoning.
	- Bilingual Learning Platforms: Apps that teach math in both Chinese and English.
	- STEM-Oriented Assistants: Supports early-stage problem solving in science and engineering contexts.
	- Lightweight LLM Deployments: Optimized for low-resource environments, from browsers to mobile devices.

	## Limitations

	1. Domain Specificity
	Primarily tuned for math reasoning — performance may degrade on unrelated tasks like creative writing or open dialogue.

	2. Model Size Constraint
	While efficient, 1.5B parameters may struggle with highly abstract or very long multi-domain tasks.

	3. PRM Bias Generalization
	PRM training can bias toward rewardable structures — results should still be reviewed for correctness and completeness.

	4. Prompt Structure Sensitivity
	Well-structured queries yield more accurate and educationally useful outputs.