Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30, 2025 • 277
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16, 2025 • 273
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Paper • 2505.12929 • Published May 19, 2025 • 3
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11, 2025 • 101
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8, 2025 • 113
ECoRAG: Evidentiality-guided Compression for Long Context RAG Paper • 2506.05167 • Published Jun 5, 2025 • 9
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models Paper • 2506.06751 • Published Jun 7, 2025 • 71