Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published 4 days ago • 59
UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published Jul 29 • 29
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 29 items • Updated 1 day ago • 75
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 130
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 29
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated Aug 7 • 25