LLM_eval - a JayLin26 Collection

JayLin26 's Collections

LLM_eval

updated Aug 29, 2025

cais/hle

Benchmark • Updated about 9 hours ago • 2.5k • 19.9k • 660
Large Language Models and Mathematical Reasoning Failures

Paper • 2502.11574 • Published Feb 17, 2025 • 3