MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces Paper • 2510.08783 • Published 24 days ago • 4
MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces Paper • 2510.08783 • Published 24 days ago • 4
MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces Paper • 2510.08783 • Published 24 days ago • 4 • 2
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs Paper • 2510.07429 • Published 25 days ago • 3
Optimizing Data Delivery: Insights from User Preferences on Visuals, Tables, and Text Paper • 2411.07451 • Published Nov 12, 2024
MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections Paper • 2502.00322 • Published Feb 1
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations Paper • 2505.14106 • Published May 20
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs Paper • 2510.07429 • Published 25 days ago • 3
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs Paper • 2510.07429 • Published 25 days ago • 3 • 2
The Photographer Eye: Teaching Multimodal Large Language Models to See and Critique like Photographers Paper • 2509.18582 • Published Sep 23 • 2
The Photographer Eye: Teaching Multimodal Large Language Models to See and Critique like Photographers Paper • 2509.18582 • Published Sep 23 • 2
The Photographer Eye: Teaching Multimodal Large Language Models to See and Critique like Photographers Paper • 2509.18582 • Published Sep 23 • 2 • 1
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20 • 18
mSCoRe: a $M$ultilingual and Scalable Benchmark for $S$kill-based $Co$mmonsense $Re$asoning Paper • 2508.10137 • Published Aug 13 • 2
mSCoRe: a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning Paper • 2508.10137 • Published Aug 13 • 2