view article Article Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments 2 days ago • 7
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10, 2024 • 17
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published Jun 17, 2024 • 8
Running 20 Rabbits Leaderboard 💊 20 Visualize and analyze language model robustness to drug name synonyms
Large Language Models to Identify Social Determinants of Health in Electronic Health Records Paper • 2308.06354 • Published Aug 11, 2023 • 3
The impact of using an AI chatbot to respond to patient messages Paper • 2310.17703 • Published Oct 26, 2023 • 5