metadata
datasets:
- m-a-p/TreePO_data
base_model:
- Qwen/Qwen2.5-7B
We release the resources for the paper TreePO:
- Checkpoint with average weighted subgroup advantages + more diverse intial divergence (the final one). ← You are here.
- Checkpoint with average weighted subgroup advantages + fixed divergence.
- The training dataset consisted of deepscaler and simplerl math reasoning.
More links: