Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 75
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations Paper • 2506.13651 • Published Jun 16, 2025 • 8
The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason Paper • 2505.22653 • Published May 28, 2025 • 43