Spaces:

whitphx
/

transformersjs-performance-leaderboard-backend

Runtime error

App Files Files Community

transformersjs-performance-leaderboard-backend / bench /docs /benchmark-id-format.md

whitphx HF Staff

id serialization

f6c05e9 about 1 month ago

preview code

raw

history blame

5.81 kB

Benchmark ID and File Organization

Overview

Benchmark results are organized using a deterministic ID system that groups results with identical settings into the same file. The ID is structured hierarchically with task at the top level, followed by model ID, and finally encoded parameters.

Directory Structure

Results are stored in nested directories with task as the top level:

benchmark-results/
├── {task}/
│   └── {org}/
│       └── {model-name}/
│           ├── {params1}.jsonl
│           ├── {params2}.jsonl
│           └── {params3}.jsonl

ID Format

Full ID: {task}/{modelId}/{platform}_{mode}_{device}_{dtype}_{batch}_{browser}_{headed}

Components

Task (top-level directory): The transformers task
- Examples: feature-extraction, text-classification, text-generation, sentiment-analysis
- Rationale: Tasks are fundamentally different operations, so they form the primary organization
Model ID (nested directory path): Full model identifier with organization
- Examples: Xenova/all-MiniLM-L6-v2, meta-llama/Llama-2-7b
- Preserved as-is, including slashes for directory structure
Platform: node or web
Mode: warm or cold
Device: Execution device
- Node.js: cpu (default), webgpu
- Web: webgpu (default), wasm
DType (optional): Model data type
- Examples: fp32, fp16, q8, q4, int8
- Omitted if not specified
Batch Size: Always included as b{N}
- Examples: b1, b4, b32
Browser (web only): Browser type
- Examples: chromium, firefox, webkit
- Omitted for Node.js benchmarks
Headed (web only): Display mode
- Included as headed only if true
- Omitted for headless mode or Node.js benchmarks

Examples

Node.js Benchmarks

feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl
feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_webgpu_fp16_b4.jsonl
feature-extraction/Xenova/all-MiniLM-L6-v2/node_cold_cpu_b1.jsonl
text-generation/meta-llama/Llama-2-7b/node_warm_cpu_q4_b1.jsonl

Web Benchmarks

feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_b1_chromium.jsonl
feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_q8_b1_firefox.jsonl
feature-extraction/Xenova/distilbert-base-uncased/web_warm_webgpu_fp16_b1_chromium_headed.jsonl
feature-extraction/Xenova/roberta-large-mnli/web_cold_wasm_b1_chromium.jsonl

Mixed Tasks and Models

benchmark-results/
├── feature-extraction/
│   └── Xenova/
│       ├── all-MiniLM-L6-v2/
│       │   ├── node_warm_cpu_fp32_b1.jsonl
│       │   └── web_warm_wasm_b1_chromium.jsonl
│       └── distilbert-base-uncased/
│           └── node_warm_webgpu_fp16_b1.jsonl
└── text-classification/
    └── Xenova/
        └── distilbert-base-uncased/
            └── node_warm_cpu_fp32_b1.jsonl

File Format

Each file is in JSONL (JSON Lines) format, with one benchmark result per line. This allows:

Appending new results without parsing the entire file
Streaming large result sets
Easy analysis with tools like jq

Example:

{"id":"uuid1","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
{"id":"uuid2","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
{"id":"uuid3","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}

Querying Results

API Endpoints

Get all results:

curl http://localhost:7860/api/benchmarks

Get results by model:

curl "http://localhost:7860/api/benchmarks?modelId=Xenova/all-MiniLM-L6-v2"

Get specific benchmark:

curl http://localhost:7860/api/benchmark/{uuid}

Direct File Access

Results can also be queried directly from the filesystem:

# All results for a specific task
cat benchmark-results/feature-extraction/**/*.jsonl | jq

# All results for a specific model across all tasks
cat benchmark-results/*/Xenova/all-MiniLM-L6-v2/*.jsonl | jq

# All results for a specific model and task
cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl | jq

# Specific configuration
cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl | jq

# Count results per configuration
wc -l benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl

# Filter by device across all models
cat benchmark-results/feature-extraction/*/*/web_*_wasm_*.jsonl | jq

# Compare same model across different tasks
cat benchmark-results/*/Xenova/distilbert-base-uncased/node_warm_cpu_fp32_b1.jsonl | jq

Benefits

Task-First Organization: Primary organization by task type, as models are typically designed for specific tasks
Grouping: Multiple runs with identical settings are stored together in JSONL files
Easy Comparison: Compare different models on the same task, or same model across different tasks
Organization: Clear hierarchy: task → org → model → configurations
Readability: Filenames are human-readable and self-documenting
Searchability: Easy to find specific configurations using filesystem tools and glob patterns
Scalability: Nested directory structure handles thousands of models and tasks
Model ID Preservation: Full model IDs maintained without sanitization, preserving org/model structure

Configuration

The base directory can be customized via environment variable:

export BENCHMARK_RESULTS_DIR=/path/to/results
npm run server

Default: ./benchmark-results