Papers
arxiv:2510.26787

Remote Labor Index: Measuring AI Automation of Remote Work

Published on Oct 30
· Submitted by taesiri on Oct 31
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.

Community

Paper submitter
edited 3 days ago
  • What: Introduces the Remote Labor Index (RLI) — a benchmark of real, paid remote-work projects (multi-sector) to test end-to-end AI agents on economically valuable tasks. 
  • Why: Existing benchmarks don’t reflect true workplace automation/value; RLI aims to measure actual completion of freelance-style projects. 
  • Scale: Projects total ~6,000+ human hours and >$140k of real work across areas like game dev, product design, architecture, data analysis, and animation. 
  • How scored: Reports an “automation rate”—share of projects an agent completes to acceptable quality. Frontier agents are near floor performance. 
  • Results (v1): Best agents reach only ~2.5% automation (Manus 2.5%, Grok-4/Sonnet-4.5 ~2.1%, GPT-5 1.7%, ChatGPT agent 1.3%, Gemini 2.5 Pro 0.8%). 

Takeaway: Despite strong scores on research benchmarks, today’s agents barely automate real freelance-style work; RLI provides a concrete yardstick to track progress. 

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.26787 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.26787 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.26787 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.