whitphx's picture
whitphx HF Staff
Set Playwright timeout
c4a8eee
|
raw
history blame
6.36 kB

Transformers.js Benchmark

Unified benchmarking tool for testing Transformers.js performance on both Node.js and browser environments.

Features

  • Unified CLI: Single entrypoint for both Node and Web benchmarks
  • Shared Core: Common benchmarking logic reduces code duplication
  • Platform Flexibility: Test on Node.js runtime or browsers (via Playwright)
  • Comprehensive Metrics: Measures load time, first inference, and subsequent inferences (p50, p90, raw)
  • Warm/Cold Modes: Test with or without caching
  • Batch Support: Run inference with configurable batch sizes
  • Device Options: Choose between WebGPU, WASM, or default CPU

Installation

npm install
npm run bench:install  # Install Playwright browsers for web benchmarks

Usage

Node.js Benchmark

# Default (Node platform)
npm run bench -- <model> <task> [options]

# Explicit Node platform
npm run bench -- <model> <task> --platform=node [options]

# Direct script
npm run bench:node -- <model> <task> [options]

Example:

npm run bench -- Xenova/all-MiniLM-L6-v2 feature-extraction --mode=warm --repeats=3 --batch-size=1

Web (Browser) Benchmark

# Via unified CLI
npm run bench -- <model> <task> --platform=web [options]

# Direct script
npm run bench:web -- <model> <task> [options]

Example:

npm run bench -- Xenova/distilbert-base-uncased feature-extraction \
  --platform=web \
  --mode=warm \
  --device=webgpu \
  --repeats=3 \
  --batch-size=1

Development Server (Browser UI)

npm run dev

Then open http://localhost:5173 to use the interactive web interface.

Options

Option Description Default Values
--platform Runtime platform node node, web
--mode Cache mode warm warm, cold
--repeats Number of test iterations 3 Any positive integer
--batch-size Batch size for inference 1 Any positive integer
--dtype Data type precision auto fp32, fp16, q8, q4, etc.
--device Device for web (browser only) webgpu webgpu, wasm
--browser Browser type (web only) chromium chromium, firefox, webkit
--headed Run browser in headed mode false true, false

Environment Variables

Variable Description Default Example
PLAYWRIGHT_TIMEOUT Playwright page timeout (ms) 300000 (5 min) 1800000 (30 min)
NAVIGATION_TIMEOUT Page navigation timeout (ms) 60000 (1 min) 120000 (2 min)

Note: The default Playwright timeout is set to 5 minutes. For large models that require longer download times, you can increase this with environment variables:

# Set custom timeouts (in milliseconds)
PLAYWRIGHT_TIMEOUT=3600000 NAVIGATION_TIMEOUT=120000 npm run bench:web -- <model> <task>

Architecture

bench/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ core/              # Shared benchmarking logic
β”‚   β”‚   β”œβ”€β”€ args.ts        # CLI argument parsing
β”‚   β”‚   β”œβ”€β”€ metrics.ts     # Statistics & aggregation
β”‚   β”‚   └── types.ts       # TypeScript interfaces
β”‚   β”œβ”€β”€ node/              # Node.js implementation
β”‚   β”‚   β”œβ”€β”€ benchmark.ts   # Node benchmark runner
β”‚   β”‚   β”œβ”€β”€ cache.ts       # Filesystem cache management
β”‚   β”‚   └── index.ts       # Node CLI entry
β”‚   β”œβ”€β”€ web/               # Browser implementation
β”‚   β”‚   β”œβ”€β”€ benchmark.ts   # Browser benchmark runner
β”‚   β”‚   β”œβ”€β”€ cache.ts       # Browser cache management (IndexedDB, etc.)
β”‚   β”‚   β”œβ”€β”€ cli.ts         # Playwright CLI for headless browser
β”‚   β”‚   └── main.ts        # Browser UI
β”‚   └── index.ts           # Unified CLI router
β”œβ”€β”€ index.html             # Browser UI HTML
β”œβ”€β”€ package.json
β”œβ”€β”€ tsconfig.json
└── vite.config.ts

⚠️ Important: WebGPU Headless Limitations

WebGPU performance in headless mode is significantly degraded (~25Γ— slower than headed mode on macOS). This is a known limitation of headless browsers:

  • Headless mode: Uses software rendering, giving misleading results
  • Headed mode (--headed=true): Uses actual GPU, reflects real performance
  • Interactive UI (npm run dev): Best for accurate WebGPU benchmarks

Recommendation: For WebGPU benchmarks, always use --headed=true or test interactively in a browser.

See: Chrome WebGPU Testing Guide | Playwright GPU Issues


Output Format

All benchmarks output JSON with the following structure:

{
  "platform": "node" | "browser",
  "runtime": "<runtime version or user agent>",
  "model": "<model-id>",
  "task": "<task-name>",
  "mode": "warm" | "cold",
  "repeats": 3,
  "batchSize": 1,
  "dtype": "<dtype>",
  "metrics": {
    "load_ms": {
      "p50": 70.5,
      "p90": 75.2,
      "raw": [67.3, 70.5, 75.2]
    },
    "first_infer_ms": {
      "p50": 3.2,
      "p90": 4.1,
      "raw": [3.1, 3.2, 4.1]
    },
    "subsequent_infer_ms": {
      "p50": 2.1,
      "p90": 2.8,
      "raw": [2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8]
    }
  }
}

Examples

Compare Node vs Browser

# Node
npm run bench -- Xenova/distilbert-base-uncased feature-extraction --repeats=5

# Browser (WebGPU)
npm run bench -- Xenova/distilbert-base-uncased feature-extraction \
  --platform=web --device=webgpu --repeats=5

# Browser (WASM)
npm run bench -- Xenova/distilbert-base-uncased feature-extraction \
  --platform=web --device=wasm --repeats=5

Test Different Quantizations

npm run bench -- Xenova/distilbert-base-uncased feature-extraction --dtype=fp32
npm run bench -- Xenova/distilbert-base-uncased feature-extraction --dtype=fp16
npm run bench -- Xenova/distilbert-base-uncased feature-extraction --dtype=q8

Batch Processing

npm run bench -- Xenova/distilbert-base-uncased feature-extraction --batch-size=8 --repeats=3

Development

# Build TypeScript
npm run build

# Run dev server for browser UI
npm run dev

# Preview built browser app
npm run preview