Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 63
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Paper • 2406.04520 • Published Jun 6, 2024 • 15
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses Paper • 2312.00763 • Published Dec 1, 2023 • 23
Instruction-Following Evaluation for Large Language Models Paper • 2311.07911 • Published Nov 14, 2023 • 22
InstructExcel: A Benchmark for Natural Language Instruction in Excel Paper • 2310.14495 • Published Oct 23, 2023 • 2
How FaR Are Large Language Models From Agents with Theory-of-Mind? Paper • 2310.03051 • Published Oct 4, 2023 • 35
Large Language Models Cannot Self-Correct Reasoning Yet Paper • 2310.01798 • Published Oct 3, 2023 • 36