File size: 9,959 Bytes
964e368 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# AutoAgents Tutorial and Architecture Guide
This guide explains how AutoAgents works end-to-end, the core architecture, and how to run and extend it. It is written for developers who want to understand the components, data flow, and the primary extension points (roles, actions, tools, and UI/service).
## What You’ll Build
- Run AutoAgents in command-line or service mode
- Understand how a Manager plans roles and execution steps
- See how a Group coordinates dynamic expert actions to complete tasks
- Extend the system with custom roles/actions/tools
## Prerequisites
- Python 3.9+ and a shell
- An LLM API key via `OPENAI_API_KEY` (or legacy `LLM_API_KEY`)
- One search key (any of): `SERPAPI_API_KEY`, `SERPER_API_KEY`, or `GOOGLE_API_KEY` + `GOOGLE_CSE_ID` for search tool features
## Quickstart
1) Install
```bash
python setup.py install
```
2) Configure environment (env vars are the single source of truth)
```bash
export OPENAI_API_KEY="sk-..."
# optional, see Config section for more
export SERPAPI_API_KEY="..." # or SERPER_API_KEY / GOOGLE_API_KEY + GOOGLE_CSE_ID
export OPENAI_API_BASE="..." # if using Azure/OpenAI compatible endpoints
```
3) Run
- Command-line mode:
```bash
python main.py --mode commandline --idea "Build a CLI snake game"
```
- WebSocket service (for the included web UI):
```bash
python main.py --mode service --host 127.0.0.1 --port 9000
```
Open `frontend/app/demo.html` in a browser to connect to the backend at `/api`.
Tip: You can also pass `--llm_api_key` and `--serpapi_key` flags. If omitted, the app reads `cfg.py` values from env.
## Architecture Overview
At a high level, AutoAgents creates a mini multi-agent organization to solve a task:
1) A Manager plans expert roles and a multi-step execution plan
2) The Environment parses that plan and spins up a Group role
3) The Group orchestrates step-by-step execution using dynamically created expert actions
4) Actions call the LLM and optional tools (e.g., web search) and can write files to `workspace/`
5) Memory records messages to drive subsequent decisions and observers can critique and refine the plan
Key components and responsibilities:
- LLM Provider: Unified via LiteLLM with rate limiting and cost tracking
- Roles: Manager, Group, Observers, and dynamically created expert roles
- Actions: Units of work run by roles; include planning, checking, and tool-using actions
- Environment: Message bus + memory + role lifecycle and execution loop
- Tools: Search engine wrappers (SerpAPI, Serper, Google CSE)
- Service/UI: WebSocket server and a simple browser UI
### Core Modules (by file)
- Entry Points
- `main.py`: CLI entry and service launcher
- `startup.py`: Creates `Explorer`, hires `Manager`, seeds the task, and runs the project
- Orchestration
- `autoagents/explorer.py`: Wraps an `Environment`, manages investment/budget and main loop
- `autoagents/environment.py`: Holds roles, memory, history; publishes/observes messages and drives role execution
- Roles
- `autoagents/roles/role.py`: Base `Role` with thinking/acting/publishing loops and LLM usage
- `autoagents/roles/manager.py`: The planner; calls actions to create/check roles and plans
- `autoagents/roles/group.py`: Executes plan steps by dynamically instantiating expert actions
- `autoagents/roles/observer.py`: Observers (agents and plan) that critique and iterate
- `autoagents/roles/custom_role.py`: Convenience for custom one-off roles
- `autoagents/roles/role_bank/*`: Optional predefined roles and mappings
- Actions
- `autoagents/actions/action/action.py`: Base `Action` with structured-output parsing and repair
- Planning lifecycle: `create_roles.py`, `check_roles.py`, `check_plans.py`, `steps.py`
- Execution: `custom_action.py` orchestrates tool use and file writes per step
- Action bank: domain-specific actions (`action_bank/*`), including `search_and_summarize.py`
- System & Tools
- `autoagents/system/provider/llm_api.py`: LiteLLM-based provider with RPM limiter and cost tracking
- `autoagents/system/memory/*`: Message memory store (with optional long-term memory)
- `autoagents/system/tools/*`: Search engine adapters and enums
- `autoagents/system/const.py`: Paths (project root, workspace, tmp, etc.)
- `cfg.py`: Centralized runtime configuration from environment variables
- Service/UI
- `ws_service.py`: WebSocket server bridging the UI and `startup`
- `frontend/app/*`: A simple browser UI connecting to `/api`
## Execution Flow
The default command-line flow:
1) `main.py` parses args and chooses mode. Command-line mode calls `startup.startup`.
2) `startup.startup` constructs an `Explorer`, hires a `Manager`, sets the budget, and seeds the task as a `Message(role="Question/Task", ...)` in the `Environment`.
3) `Explorer.run()` loops, calling `Environment.run()` to let roles observe messages and act.
4) `Manager` runs planning actions in cycles:
- CreateRoles → produces “Selected Roles List”, “Created Roles List”, and an “Execution Plan”
- CheckRoles → critiques role choices and suggests refinements
- CheckPlans → critiques the execution plan and suggests refinements
The manager iterates until there are no further suggestions or a max iteration is reached.
5) `Environment.publish_message()` parses the manager’s output to:
- Extract an ordered list of steps (the plan)
- Extract role specs (name, prompt, tools, suggestions)
- Create a `Group` role that dynamically constructs per-role `CustomAction` subclasses
6) `Group` orchestrates the plan:
- Tracks remaining steps and picks the next step
- For the step’s responsible role(s), runs the corresponding dynamic action
- Each action can use tools such as search (`SearchAndSummarize`) and can write files to `workspace/`
- Emits messages with structured “Step/Response” content
7) Memory and history accumulate across all roles; observers can critique (`ObserverAgents`, `ObserverPlans`) to refine plan/roles.
## Configuration
All configuration is read from environment variables by `cfg.py`. Important settings:
- LLM and Provider
- `OPENAI_API_KEY` (alias: `LLM_API_KEY`)
- `OPENAI_API_MODEL` (default `gpt-4o`), Azure style: `OPENAI_API_BASE`, `OPENAI_API_TYPE`, `OPENAI_API_VERSION`, `DEPLOYMENT_ID`
- `RPM` requests-per-minute limiter (min 1)
- `MAX_TOKENS`, `TEMPERATURE`, `TOP_P`, `PRESENCE_PENALTY`, `FREQUENCY_PENALTY`, `N`
- `LLM_TIMEOUT` seconds
- Budgeting
- `MAX_BUDGET` dollars; cost tracked via LiteLLM pricing or fallback table
- Proxies
- `GLOBAL_PROXY` or `OPENAI_PROXY` (auto-propagated to `HTTP_PROXY`/`HTTPS_PROXY` when set)
- Additional Models
- `CLAUDE_API_KEY` (aliases: `ANTHROPIC_API_KEY`, `Anthropic_API_KEY`), `CLAUDE_MODEL`
- Search
- `SEARCH_ENGINE` one of: `serpapi`, `serper`, `google`, `ddg`, `custom`
- Keys as applicable: `SERPAPI_API_KEY`, `SERPER_API_KEY`, `GOOGLE_API_KEY`, `GOOGLE_CSE_ID`
- Memory and Parsing
- `LONG_TERM_MEMORY` true/false
- `LLM_PARSER_REPAIR`, `LLM_PARSER_REPAIR_ATTEMPTS` enable schema repair for action outputs
## Tools and File Output
- Search: `autoagents/system/tools/search_engine.py` routes queries to SerpAPI, Serper, or Google CSE
- Custom engines: pass a `run_func` to `SearchEngine` or set `SEARCH_ENGINE=custom`
- File writes: actions can write files to `workspace/` via `CustomAction` using the `Write File` action format
Workspace location is resolved by `autoagents/system/const.py` and is safe to inspect while the system runs.
## Service Mode and Web UI
- Start server: `python main.py --mode service --host 127.0.0.1 --port 9000`
- Open `frontend/app/demo.html` in a browser; it connects to `ws://<host>:<port>/api` or `wss://` for HTTPS
- Provide API keys in the left panel; submit a task idea; watch agents and steps stream in
`ws_service.py` manages incoming tasks and streams role messages to the UI. Each run is isolated in a process and can be interrupted.
## Extending AutoAgents
1) Add a new predefined role (Role Bank)
- Create a class extending `Role` and implement its actions
- Register it in `autoagents/roles/role_bank/__init__.py` (in `ROLES_MAPPING`) and optionally in `ROLES_LIST`
2) Add a new action
- Extend `autoagents/actions/action/action.py` and implement `run`
- If the action needs structured outputs, define an output mapping and use `_aask_v1` for parsing/repair
- Add domain-specific actions in `autoagents/actions/action_bank/`
3) Add a tool
- Add a new wrapper in `autoagents/system/tools/`
- Extend the `SearchEngineType` enum if relevant, and route inside `SearchEngine.run`
4) Create a custom one-off role at runtime
- `Group` already builds dynamic `CustomAction` classes from the Manager’s role specs (name, prompt, tools, suggestions), so many scenarios don’t require code changes
- For tailored behavior, check `autoagents/roles/custom_role.py`
## Error Handling, Costs, and Limits
- Rate limiting: `RPM` controls request pacing at the provider level
- Streaming: provider streams tokens to stdout in CLI mode
- Costs: tracked per-request; enforced against `MAX_BUDGET`
- Output robustness: Actions use a schema parser and, if enabled, an LLM repair step to coerce outputs into the expected shape
## Troubleshooting
- “Invalid key” or missing results for search: ensure one of the search keys is set
- Stuck or slow progress: increase `RPM`, check proxy settings, or reduce `TEMPERATURE`
- No files appear: verify the action selected `Write File` and inspect the `workspace/` folder
- Azure setups: ensure `OPENAI_API_BASE`, `OPENAI_API_TYPE=azure`, `OPENAI_API_VERSION`, and `DEPLOYMENT_ID` are set
## Next Steps
- Wire in more tools (browsers, code execution, retrieval)
- Add new observers for domain-specific validation
- Build richer UIs or APIs on top of the WebSocket service
If you have ideas or questions, please open an issue or PR. Enjoy building with AutoAgents!
|