Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs Paper • 2509.01790 • Published Sep 1, 2025 • 5 • 1