SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24 • 8
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published Sep 28 • 8
Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published Sep 29 • 4
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published Sep 28 • 5
Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring Paper • 2502.05242 • Published Feb 7