BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published 19 days ago • 32
view article Article BigCodeArena: Judging code generations end to end with code executions By bigcode • 22 days ago • 16
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 34
NextCoder Collection NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. • 6 items • Updated Jul 9 • 71
Running on CPU Upgrade 13.6k 13.6k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots