MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
Paper
•
2511.11373
•
Published
•
12
Supported models: Qwen3/DeepSeekV3.1/DeepSeek R1. You can modify the llm_client.py to use other models.
python3 vc_reasoning_system.py solver_ip_port_1,solver_ip_port_2,... vc_ip_port_1,vc_ip_port_2,... test_file output_dir
for example: python3 vc_reasoning_system.py 8.8.8.8:8021,12.34.56.78:8021 8.8.8.8:8021,12.34.56.78:8021 ./outputs/debug ./test_corpus/aime2025.jsonl
This step will run the reasoning system for each problem in the given $test_file$, the predicted results can be found in the output_dir
python3 extract_solution.py result_dir test_file
for example: python3 extract_solution.py ./outputs/debug ./test_corpus/aime_2025.jsonl
This step will generate a file named "eval_overalljsonl" in the input_dir. Your can evaluate the metrics based on this file.
Our implementation is heaviliy built on verl.
Our models are trained on top of Qwen3-30B-A3B-Thinking-2507.
Our V-C Reasoning system is built on IMO25 pipline.
Thanks for their wonderful work.
@article{Marsrl2025,
title = {MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism},
author = {Shulin Liu, Dong Du, Tao Yang, Yang Li, Boyu Qiu}
year = {2025}
}
Totally Free + Zero Barriers + No Login Required