EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning Paper • 2601.03471 • Published 1 day ago • 4
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation Paper • 2512.19134 • Published 17 days ago • 31
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs Paper • 2511.14159 • Published Nov 18, 2025 • 24