Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards Paper • 2510.18814 • Published Oct 21, 2025 • 1