Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning Paper • 2601.07641 • Published 13 days ago • 45
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 12 days ago • 140
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 11 days ago • 82
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published 13 days ago • 51
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 11 days ago • 123
Aster2024/swift-reasoning-rollouts-deepscaler-ministral8b Viewer • Updated 12 days ago • 10k • 28 • 2
Aster2024/swift-reasoning-rollouts-deepscaler-ministral8b Viewer • Updated 12 days ago • 10k • 28 • 2
Aster2024/swift-reasoning-rollouts-deepscaler-ministral8b Viewer • Updated 12 days ago • 10k • 28 • 2
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published Oct 2, 2025 • 28
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling Paper • 2505.12225 • Published May 18, 2025 • 9
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling Paper • 2505.12225 • Published May 18, 2025 • 9
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 86