TreePO-Qwen2.5-7B / README.md
yizhilll's picture
Create README.md
18fbbcc verified
|
raw
history blame
843 Bytes
metadata
datasets:
  - m-a-p/TreePO_data
base_model:
  - Qwen/Qwen2.5-7B

We release the resources for the paper TreePO:

  • Checkpoint with average weighted subgroup advantages + more diverse intial divergence (the final one). ← You are here.
  • Checkpoint with average weighted subgroup advantages + fixed divergence.
  • The training dataset consisted of deepscaler and simplerl math reasoning.

More links: