Spaces:
Runtime error
Benchmark ID and File Organization
Overview
Benchmark results are organized using a deterministic ID system that groups results with identical settings into the same file. The ID is structured hierarchically with task at the top level, followed by model ID, and finally encoded parameters.
Directory Structure
Results are stored in nested directories with task as the top level:
benchmark-results/
βββ {task}/
β βββ {org}/
β βββ {model-name}/
β βββ {params1}.jsonl
β βββ {params2}.jsonl
β βββ {params3}.jsonl
ID Format
Full ID: {task}/{modelId}/{platform}_{mode}_{device}_{dtype}_{batch}_{browser}_{headed}
Components
Task (top-level directory): The transformers task
- Examples:
feature-extraction,text-classification,text-generation,sentiment-analysis - Rationale: Tasks are fundamentally different operations, so they form the primary organization
- Examples:
Model ID (nested directory path): Full model identifier with organization
- Examples:
Xenova/all-MiniLM-L6-v2,meta-llama/Llama-2-7b - Preserved as-is, including slashes for directory structure
- Examples:
Platform:
nodeorwebMode:
warmorcoldDevice: Execution device
- Node.js:
cpu(default),webgpu - Web:
webgpu(default),wasm
- Node.js:
DType (optional): Model data type
- Examples:
fp32,fp16,q8,q4,int8 - Omitted if not specified
- Examples:
Batch Size: Always included as
b{N}- Examples:
b1,b4,b32
- Examples:
Browser (web only): Browser type
- Examples:
chromium,firefox,webkit - Omitted for Node.js benchmarks
- Examples:
Headed (web only): Display mode
- Included as
headedonly if true - Omitted for headless mode or Node.js benchmarks
- Included as
Examples
Node.js Benchmarks
feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl
feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_webgpu_fp16_b4.jsonl
feature-extraction/Xenova/all-MiniLM-L6-v2/node_cold_cpu_b1.jsonl
text-generation/meta-llama/Llama-2-7b/node_warm_cpu_q4_b1.jsonl
Web Benchmarks
feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_b1_chromium.jsonl
feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_q8_b1_firefox.jsonl
feature-extraction/Xenova/distilbert-base-uncased/web_warm_webgpu_fp16_b1_chromium_headed.jsonl
feature-extraction/Xenova/roberta-large-mnli/web_cold_wasm_b1_chromium.jsonl
Mixed Tasks and Models
benchmark-results/
βββ feature-extraction/
β βββ Xenova/
β βββ all-MiniLM-L6-v2/
β β βββ node_warm_cpu_fp32_b1.jsonl
β β βββ web_warm_wasm_b1_chromium.jsonl
β βββ distilbert-base-uncased/
β βββ node_warm_webgpu_fp16_b1.jsonl
βββ text-classification/
βββ Xenova/
βββ distilbert-base-uncased/
βββ node_warm_cpu_fp32_b1.jsonl
File Format
Each file is in JSONL (JSON Lines) format, with one benchmark result per line. This allows:
- Appending new results without parsing the entire file
- Streaming large result sets
- Easy analysis with tools like
jq
Example:
{"id":"uuid1","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
{"id":"uuid2","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
{"id":"uuid3","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
Querying Results
API Endpoints
Get all results:
curl http://localhost:7860/api/benchmarksGet results by model:
curl "http://localhost:7860/api/benchmarks?modelId=Xenova/all-MiniLM-L6-v2"Get specific benchmark:
curl http://localhost:7860/api/benchmark/{uuid}
Direct File Access
Results can also be queried directly from the filesystem:
# All results for a specific task
cat benchmark-results/feature-extraction/**/*.jsonl | jq
# All results for a specific model across all tasks
cat benchmark-results/*/Xenova/all-MiniLM-L6-v2/*.jsonl | jq
# All results for a specific model and task
cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl | jq
# Specific configuration
cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl | jq
# Count results per configuration
wc -l benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl
# Filter by device across all models
cat benchmark-results/feature-extraction/*/*/web_*_wasm_*.jsonl | jq
# Compare same model across different tasks
cat benchmark-results/*/Xenova/distilbert-base-uncased/node_warm_cpu_fp32_b1.jsonl | jq
Benefits
- Task-First Organization: Primary organization by task type, as models are typically designed for specific tasks
- Grouping: Multiple runs with identical settings are stored together in JSONL files
- Easy Comparison: Compare different models on the same task, or same model across different tasks
- Organization: Clear hierarchy: task β org β model β configurations
- Readability: Filenames are human-readable and self-documenting
- Searchability: Easy to find specific configurations using filesystem tools and glob patterns
- Scalability: Nested directory structure handles thousands of models and tasks
- Model ID Preservation: Full model IDs maintained without sanitization, preserving org/model structure
Configuration
The base directory can be customized via environment variable:
export BENCHMARK_RESULTS_DIR=/path/to/results
npm run server
Default: ./benchmark-results