Spaces:
Running
Running
File size: 3,502 Bytes
085aa8e 383c5ed 9f76443 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
title: Browser
emoji: 🦀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---
# Browser API
This document describes how to use the Browser API to search the web and scrape website content. The API is built with Gradio and Playwright, providing a simple interface for web automation tasks.
## API Endpoint
The primary endpoint for this API is `/api/web_browse`. This is a `POST` endpoint that accepts a JSON payload.
## Authentication
This API is public and does not require authentication.
## Actions
The API can perform two main actions: `Search` and `Scrape URL`.
### Search
The `Search` action allows you to perform a web search using a specified search engine. The API will return the content of the search results page in Markdown format.
### Scrape URL
The `Scrape URL` action allows you to retrieve the content of a specific URL. The API will fetch the page, process the HTML, and return the main content in a clean, readable Markdown format.
## Request Body
The request body must be a JSON object with the following structure:
```json
{
"action": "Search" | "Scrape URL",
"query": "string",
"browser_name": "firefox" | "chromium" | "webkit",
"search_engine_name": "string"
}
```
**Parameters:**
* `action` (string, required): The action to perform. Must be either `"Search"` or `"Scrape URL"`.
* `query` (string, required): The search query or the URL to scrape.
* `browser_name` (string, optional): The browser to use for the operation. Defaults to `"firefox"`.
* Available options: `"firefox"`, `"chromium"`, `"webkit"`.
* `search_engine_name` (string, optional): The search engine to use when the action is `"Search"`. Defaults to `"DuckDuckGo"`.
* A full list of supported search engines can be found in the "Supported Search Engines" section.
## Response Body
The API will return a JSON object with the results of the operation.
**On Success:**
```json
{
"status": "success",
"query": "your_query",
"action": "Search" | "Scrape URL",
"final_url": "https://example.com",
"page_title": "Example Domain",
"http_status": 200,
"proxy_used": "Direct Connection",
"markdown_content": "# Example Domain..."
}
```
**On Error:**
```json
{
"status": "error",
"query": "your_query",
"proxy_used": "Direct Connection",
"error_message": "Navigation Timeout: The page for 'your_query' took too long to load."
}
```
## Examples
Here are some examples of how to use the API with `curl`.
### Example 1: Performing a Search
This example performs a search for "latest AI research" using Google.
```bash
curl -X POST -H "Content-Type: application/json" \
-d '{
"action": "Search",
"query": "latest AI research",
"browser_name": "chromium",
"search_engine_name": "Google"
}' \
https://broadfield-dev-browser.hf.space/api/web_browse
```
### Example 2: Scraping a URL
This example scrapes the content from the Wikipedia page for "Web scraping".
```bash
curl -X POST -H "Content-Type: application/json" \
-d '{
"action": "Scrape URL",
"query": "https://en.wikipedia.org/wiki/Web_scraping",
"browser_name": "firefox"
}' \
https://broadfield-dev-browser.hf.space/api/web_browse
```
## Supported Search Engines
The following search engines are supported when using the `"Search"` action:
* Google
* DuckDuckGo
* Bing
* Brave
* Ecosia
* Yahoo
* Startpage
* Qwant
* Swisscows
* You.com
* SearXNG
* MetaGer
* Yandex
* Baidu
* Perplexity |