Spaces:

broadfield-dev
/

browser

Running

App Files Files Community

browser / README.md

broadfield-dev

Update README.md

9f76443 verified about 1 month ago

preview code

raw

history blame contribute delete

3.5 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: Browser
emoji: 🦀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false

Browser API

This document describes how to use the Browser API to search the web and scrape website content. The API is built with Gradio and Playwright, providing a simple interface for web automation tasks.

API Endpoint

The primary endpoint for this API is /api/web_browse. This is a POST endpoint that accepts a JSON payload.

Authentication

This API is public and does not require authentication.

Actions

The API can perform two main actions: Search and Scrape URL.

Search

The Search action allows you to perform a web search using a specified search engine. The API will return the content of the search results page in Markdown format.

Scrape URL

The Scrape URL action allows you to retrieve the content of a specific URL. The API will fetch the page, process the HTML, and return the main content in a clean, readable Markdown format.

Request Body

The request body must be a JSON object with the following structure:

{
  "action": "Search" | "Scrape URL",
  "query": "string",
  "browser_name": "firefox" | "chromium" | "webkit",
  "search_engine_name": "string"
}

Parameters:

action (string, required): The action to perform. Must be either "Search" or "Scrape URL".
query (string, required): The search query or the URL to scrape.
browser_name (string, optional): The browser to use for the operation. Defaults to "firefox".
- Available options: "firefox", "chromium", "webkit".
search_engine_name (string, optional): The search engine to use when the action is "Search". Defaults to "DuckDuckGo".
- A full list of supported search engines can be found in the "Supported Search Engines" section.

Response Body

The API will return a JSON object with the results of the operation.

On Success:

{
  "status": "success",
  "query": "your_query",
  "action": "Search" | "Scrape URL",
  "final_url": "https://example.com",
  "page_title": "Example Domain",
  "http_status": 200,
  "proxy_used": "Direct Connection",
  "markdown_content": "# Example Domain..."
}

On Error:

{
  "status": "error",
  "query": "your_query",
  "proxy_used": "Direct Connection",
  "error_message": "Navigation Timeout: The page for 'your_query' took too long to load."
}

Examples

Here are some examples of how to use the API with curl.

Example 1: Performing a Search

This example performs a search for "latest AI research" using Google.

curl -X POST -H "Content-Type: application/json" \
-d '{
  "action": "Search",
  "query": "latest AI research",
  "browser_name": "chromium",
  "search_engine_name": "Google"
}' \
https://broadfield-dev-browser.hf.space/api/web_browse

Example 2: Scraping a URL

This example scrapes the content from the Wikipedia page for "Web scraping".

curl -X POST -H "Content-Type: application/json" \
-d '{
  "action": "Scrape URL",
  "query": "https://en.wikipedia.org/wiki/Web_scraping",
  "browser_name": "firefox"
}' \
https://broadfield-dev-browser.hf.space/api/web_browse

Supported Search Engines

The following search engines are supported when using the "Search" action:

Google
DuckDuckGo
Bing
Brave
Ecosia
Yahoo
Startpage
Qwant
Swisscows
You.com
SearXNG
MetaGer
Yandex
Baidu
Perplexity