Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
Abstract
A case study of four attempts to autonomously generate ML research papers using LLM agents reveals recurring failure modes and proposes design principles for robust AI-scientist systems.
We report a case study of four end-to-end attempts to autonomously generate ML research papers using a pipeline of six LLM agents mapped to stages of the scientific workflow. Of these four, three attempts failed during implementation or evaluation. One completed the pipeline and was accepted to Agents4Science 2025, an experimental inaugural venue that required AI systems as first authors, passing both human and multi-AI review. From these attempts, we document six recurring failure modes: bias toward training data defaults, implementation drift under execution pressure, memory and context degradation across long-horizon tasks, overexcitement that declares success despite obvious failures, insufficient domain intelligence, and weak scientific taste in experimental design. We conclude by discussing four design principles for more robust AI-scientist systems, implications for autonomous scientific discovery, and we release all prompts, artifacts, and outputs at https://github.com/Lossfunk/ai-scientist-artefacts-v1
Community
We find that LLMs aren't scientists yet.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents (2025)
- DeepCode: Open Agentic Coding (2025)
- HeurekaBench: A Benchmarking Framework for AI Co-scientist (2026)
- Multi-Agent Systems for Dataset Adaptation in Software Engineering: Capabilities, Limitations, and Future Directions (2025)
- SelfAI: Building a Self-Training AI System with LLM Agents (2025)
- OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists (2025)
- AInsteinBench: Benchmarking Coding Agents on Scientific Repositories (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper