T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground Paper • 2512.10430 • Published 14 days ago • 112
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Paper • 2508.04280 • Published Aug 6 • 35
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 113
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10, 2024 • 68