QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published 8 days ago • 31
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration Paper • 2508.13755 • Published Aug 19 • 14
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 201
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118