Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model Paper • 2510.18855 • Published 7 days ago • 60
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models Paper • 2510.08531 • Published 19 days ago • 11
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering Paper • 2509.25175 • Published 29 days ago • 29
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts Paper • 2509.25160 • Published 29 days ago • 30
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models Paper • 2508.05613 • Published Aug 7 • 17
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency Paper • 2508.05615 • Published Aug 7 • 22
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published Jul 21 • 35
Hierarchical Budget Policy Optimization for Adaptive Reasoning Paper • 2507.15844 • Published Jul 21 • 16
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning Paper • 2506.21285 • Published Jun 26
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving Paper • 2502.12022 • Published Feb 17
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation Paper • 2506.03139 • Published Jun 3 • 17
Do Large Language Models Excel in Complex Logical Reasoning with Formal Language? Paper • 2505.16998 • Published May 22 • 2
Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering Paper • 2402.14320 • Published Feb 22, 2024
ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models Paper • 2505.21500 • Published May 27 • 13
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models Paper • 2502.00334 • Published Feb 1
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification Paper • 2503.01940 • Published Mar 3
Let LLMs Break Free from Overthinking via Self-Braking Tuning Paper • 2505.14604 • Published May 20 • 23
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Paper • 2505.14684 • Published May 20 • 24