che111
/

AlphaMed-3B-instruct-rl

Model card Files Files and versions

che111 commited on Jun 1

Commit

3e83183

·

verified ·

1 Parent(s): f7785a6

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -5,8 +5,9 @@ license: mit
 # 🧠 AlphaMed
 This is the official model checkpoint for the paper:
-**[AlphaMed: Incentivizing Medical Reasoning with Reinforcement Learning Only](https://www.arxiv.org/abs/2505.17952)**
-AlphaMed is a medical large language model trained **without supervised fine-tuning or chain-of-thought (CoT) data**, relying solely on reinforcement learning to elicit step-by-step reasoning in complex medical tasks.
 ## 🚀 Usage
@@ -21,7 +22,7 @@ To use the model, format your input prompt as:
 from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
 # Load model and tokenizer
-model_id = "your-hf-username/med-r1-zero"  # Replace with actual repo path
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id)
@@ -35,5 +36,6 @@ prompt = (
 )
 # Generate output
-output = pipe(prompt, max_new_tokens=256, do_sample=False)[0]["generated_text"]
 print(output)

 # 🧠 AlphaMed
 This is the official model checkpoint for the paper:
+**[AlphaMed: Incentivizing Medical Reasoning with minimalist Rule-Based RL](https://www.arxiv.org/abs/2505.17952)**
+AlphaMed is a medical large language model trained **without supervised fine-tuning on chain-of-thought (CoT) data**,
+relying solely on reinforcement learning to elicit step-by-step reasoning in complex medical tasks.
 ## 🚀 Usage
 from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
 # Load model and tokenizer
+model_id = "che111/AlphaMed-3B-instruct-rl"  # Replace with actual repo path
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id)
 )
 # Generate output
+max_new_tokens=8196
+output = pipe(prompt, max_new_tokens=max_new_tokens, do_sample=False)[0]["generated_text"]
 print(output)