Training-Free Group Relative Policy Optimization Paper • 2510.08191 • Published 20 days ago • 43
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published Sep 26 • 29
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published Aug 20 • 82
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following Paper • 2508.02150 • Published Aug 4 • 36
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29 • 131
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 88
WebSailor: Navigating Super-human Reasoning for Web Agent Paper • 2507.02592 • Published Jul 3 • 120
RAIF Collection Datasets and models in the paper "Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models" [github.com/yuleiqin/RAIF]. • 12 items • Updated Jul 17 • 1
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks Paper • 2506.01952 • Published Jun 2 • 10
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models Paper • 2506.01413 • Published Jun 2 • 16
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 172
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12, 2024 • 17
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Paper • 2408.15915 • Published Aug 28, 2024 • 19
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models Paper • 2408.02085 • Published Aug 4, 2024 • 19