Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 259
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench Paper • 2506.09289 • Published Jun 10, 2025 • 2
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination Paper • 2503.04149 • Published Mar 6, 2025 • 6
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming Paper • 2505.12925 • Published May 19, 2025 • 2