Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published about 1 month ago • 93
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 431