A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published 1 day ago • 110
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 175
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published 4 days ago • 72
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published 28 days ago • 8