AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs Paper • 2509.08031 • Published Sep 9 • 21
SemEval 2023 Task 6: LegalEval - Understanding Legal Texts Paper • 2304.09548 • Published Apr 19, 2023
DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs Paper • 2503.15793 • Published Mar 20
SynthCypher: A Fully Synthetic Data Generation Framework for Text-to-Cypher Querying in Knowledge Graphs Paper • 2412.12612 • Published Dec 17, 2024 • 4
SemEval 2023 Task 6: LegalEval - Understanding Legal Texts Paper • 2304.09548 • Published Apr 19, 2023