Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 75
Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following Paper • 2511.10507 • Published Nov 13, 2025 • 6