AutoAgents Tutorial and Architecture Guide
This guide explains how AutoAgents works end-to-end, the core architecture, and how to run and extend it. It is written for developers who want to understand the components, data flow, and the primary extension points (roles, actions, tools, and UI/service).
What You’ll Build
- Run AutoAgents in command-line or service mode
- Understand how a Manager plans roles and execution steps
- See how a Group coordinates dynamic expert actions to complete tasks
- Extend the system with custom roles/actions/tools
Prerequisites
- Python 3.9+ and a shell
- An LLM API key via
OPENAI_API_KEY(or legacyLLM_API_KEY) - One search key (any of):
SERPAPI_API_KEY,SERPER_API_KEY, orGOOGLE_API_KEY+GOOGLE_CSE_IDfor search tool features
Quickstart
- Install
python setup.py install
- Configure environment (env vars are the single source of truth)
export OPENAI_API_KEY="sk-..."
# optional, see Config section for more
export SERPAPI_API_KEY="..." # or SERPER_API_KEY / GOOGLE_API_KEY + GOOGLE_CSE_ID
export OPENAI_API_BASE="..." # if using Azure/OpenAI compatible endpoints
- Run
- Command-line mode:
python main.py --mode commandline --idea "Build a CLI snake game"
- WebSocket service (for the included web UI):
python main.py --mode service --host 127.0.0.1 --port 9000
Open frontend/app/demo.html in a browser to connect to the backend at /api.
Tip: You can also pass --llm_api_key and --serpapi_key flags. If omitted, the app reads cfg.py values from env.
Architecture Overview
At a high level, AutoAgents creates a mini multi-agent organization to solve a task:
- A Manager plans expert roles and a multi-step execution plan
- The Environment parses that plan and spins up a Group role
- The Group orchestrates step-by-step execution using dynamically created expert actions
- Actions call the LLM and optional tools (e.g., web search) and can write files to
workspace/ - Memory records messages to drive subsequent decisions and observers can critique and refine the plan
Key components and responsibilities:
- LLM Provider: Unified via LiteLLM with rate limiting and cost tracking
- Roles: Manager, Group, Observers, and dynamically created expert roles
- Actions: Units of work run by roles; include planning, checking, and tool-using actions
- Environment: Message bus + memory + role lifecycle and execution loop
- Tools: Search engine wrappers (SerpAPI, Serper, Google CSE)
- Service/UI: WebSocket server and a simple browser UI
Core Modules (by file)
Entry Points
main.py: CLI entry and service launcherstartup.py: CreatesExplorer, hiresManager, seeds the task, and runs the project
Orchestration
autoagents/explorer.py: Wraps anEnvironment, manages investment/budget and main loopautoagents/environment.py: Holds roles, memory, history; publishes/observes messages and drives role execution
Roles
autoagents/roles/role.py: BaseRolewith thinking/acting/publishing loops and LLM usageautoagents/roles/manager.py: The planner; calls actions to create/check roles and plansautoagents/roles/group.py: Executes plan steps by dynamically instantiating expert actionsautoagents/roles/observer.py: Observers (agents and plan) that critique and iterateautoagents/roles/custom_role.py: Convenience for custom one-off rolesautoagents/roles/role_bank/*: Optional predefined roles and mappings
Actions
autoagents/actions/action/action.py: BaseActionwith structured-output parsing and repair- Planning lifecycle:
create_roles.py,check_roles.py,check_plans.py,steps.py - Execution:
custom_action.pyorchestrates tool use and file writes per step - Action bank: domain-specific actions (
action_bank/*), includingsearch_and_summarize.py
System & Tools
autoagents/system/provider/llm_api.py: LiteLLM-based provider with RPM limiter and cost trackingautoagents/system/memory/*: Message memory store (with optional long-term memory)autoagents/system/tools/*: Search engine adapters and enumsautoagents/system/const.py: Paths (project root, workspace, tmp, etc.)cfg.py: Centralized runtime configuration from environment variables
Service/UI
ws_service.py: WebSocket server bridging the UI andstartupfrontend/app/*: A simple browser UI connecting to/api
Execution Flow
The default command-line flow:
main.pyparses args and chooses mode. Command-line mode callsstartup.startup.startup.startupconstructs anExplorer, hires aManager, sets the budget, and seeds the task as aMessage(role="Question/Task", ...)in theEnvironment.Explorer.run()loops, callingEnvironment.run()to let roles observe messages and act.Managerruns planning actions in cycles:- CreateRoles → produces “Selected Roles List”, “Created Roles List”, and an “Execution Plan”
- CheckRoles → critiques role choices and suggests refinements
- CheckPlans → critiques the execution plan and suggests refinements The manager iterates until there are no further suggestions or a max iteration is reached.
Environment.publish_message()parses the manager’s output to:- Extract an ordered list of steps (the plan)
- Extract role specs (name, prompt, tools, suggestions)
- Create a
Grouprole that dynamically constructs per-roleCustomActionsubclasses
Grouporchestrates the plan:- Tracks remaining steps and picks the next step
- For the step’s responsible role(s), runs the corresponding dynamic action
- Each action can use tools such as search (
SearchAndSummarize) and can write files toworkspace/ - Emits messages with structured “Step/Response” content
- Memory and history accumulate across all roles; observers can critique (
ObserverAgents,ObserverPlans) to refine plan/roles.
Configuration
All configuration is read from environment variables by cfg.py. Important settings:
LLM and Provider
OPENAI_API_KEY(alias:LLM_API_KEY)OPENAI_API_MODEL(defaultgpt-4o), Azure style:OPENAI_API_BASE,OPENAI_API_TYPE,OPENAI_API_VERSION,DEPLOYMENT_IDRPMrequests-per-minute limiter (min 1)MAX_TOKENS,TEMPERATURE,TOP_P,PRESENCE_PENALTY,FREQUENCY_PENALTY,NLLM_TIMEOUTseconds
Budgeting
MAX_BUDGETdollars; cost tracked via LiteLLM pricing or fallback table
Proxies
GLOBAL_PROXYorOPENAI_PROXY(auto-propagated toHTTP_PROXY/HTTPS_PROXYwhen set)
Additional Models
CLAUDE_API_KEY(aliases:ANTHROPIC_API_KEY,Anthropic_API_KEY),CLAUDE_MODEL
Search
SEARCH_ENGINEone of:serpapi,serper,google,ddg,custom- Keys as applicable:
SERPAPI_API_KEY,SERPER_API_KEY,GOOGLE_API_KEY,GOOGLE_CSE_ID
Memory and Parsing
LONG_TERM_MEMORYtrue/falseLLM_PARSER_REPAIR,LLM_PARSER_REPAIR_ATTEMPTSenable schema repair for action outputs
Tools and File Output
- Search:
autoagents/system/tools/search_engine.pyroutes queries to SerpAPI, Serper, or Google CSE - Custom engines: pass a
run_functoSearchEngineor setSEARCH_ENGINE=custom - File writes: actions can write files to
workspace/viaCustomActionusing theWrite Fileaction format
Workspace location is resolved by autoagents/system/const.py and is safe to inspect while the system runs.
Service Mode and Web UI
- Start server:
python main.py --mode service --host 127.0.0.1 --port 9000 - Open
frontend/app/demo.htmlin a browser; it connects tows://<host>:<port>/apiorwss://for HTTPS - Provide API keys in the left panel; submit a task idea; watch agents and steps stream in
ws_service.py manages incoming tasks and streams role messages to the UI. Each run is isolated in a process and can be interrupted.
Extending AutoAgents
- Add a new predefined role (Role Bank)
- Create a class extending
Roleand implement its actions - Register it in
autoagents/roles/role_bank/__init__.py(inROLES_MAPPING) and optionally inROLES_LIST
- Add a new action
- Extend
autoagents/actions/action/action.pyand implementrun - If the action needs structured outputs, define an output mapping and use
_aask_v1for parsing/repair - Add domain-specific actions in
autoagents/actions/action_bank/
- Add a tool
- Add a new wrapper in
autoagents/system/tools/ - Extend the
SearchEngineTypeenum if relevant, and route insideSearchEngine.run
- Create a custom one-off role at runtime
Groupalready builds dynamicCustomActionclasses from the Manager’s role specs (name, prompt, tools, suggestions), so many scenarios don’t require code changes- For tailored behavior, check
autoagents/roles/custom_role.py
Error Handling, Costs, and Limits
- Rate limiting:
RPMcontrols request pacing at the provider level - Streaming: provider streams tokens to stdout in CLI mode
- Costs: tracked per-request; enforced against
MAX_BUDGET - Output robustness: Actions use a schema parser and, if enabled, an LLM repair step to coerce outputs into the expected shape
Troubleshooting
- “Invalid key” or missing results for search: ensure one of the search keys is set
- Stuck or slow progress: increase
RPM, check proxy settings, or reduceTEMPERATURE - No files appear: verify the action selected
Write Fileand inspect theworkspace/folder - Azure setups: ensure
OPENAI_API_BASE,OPENAI_API_TYPE=azure,OPENAI_API_VERSION, andDEPLOYMENT_IDare set
Next Steps
- Wire in more tools (browsers, code execution, retrieval)
- Add new observers for domain-specific validation
- Build richer UIs or APIs on top of the WebSocket service
If you have ideas or questions, please open an issue or PR. Enjoy building with AutoAgents!