AI & Human Co-Improvement for Safer Co-Superintelligence Paper • 2512.05356 • Published 25 days ago • 8
AI & Human Co-Improvement for Safer Co-Superintelligence Paper • 2512.05356 • Published 25 days ago • 8 • 2
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 116 • 5
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 116 • 5
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28 • 17
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28 • 17 • 1
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9 • 41
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29 • 18
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8 • 16
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8 • 16 • 2
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 24
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 24
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning Paper • 2505.10320 • Published May 15 • 24
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19 • 13
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18 • 15