Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning Paper β’ 2509.25534 β’ Published Sep 19, 2025 β’ 3
Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning Paper β’ 2509.25534 β’ Published Sep 19, 2025 β’ 3
Running 37 BigCodeArena π 37 Compare two AI models by sending them code and seeing their responses
MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework Paper β’ 2508.14880 β’ Published Aug 20, 2025 β’ 15
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment Paper β’ 2508.07750 β’ Published Aug 11, 2025 β’ 21
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment Paper β’ 2508.07750 β’ Published Aug 11, 2025 β’ 21
Running 3.66k The Ultra-Scale Playbook π 3.66k The ultimate guide to training LLM on large GPU Clusters
Congliu/Chinese-DeepSeek-R1-Distill-data-110k Viewer β’ Updated Feb 21, 2025 β’ 110k β’ 427 β’ 720