Spaces:
Runtime error
Hugging Face Dataset Integration
The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing.
Features
- Automatic Upload: Results are automatically pushed to HF Dataset when benchmarks complete
- File Structure Preservation: Uses the same path structure:
{task}/{org}/{model}/{params}.json - JSON Format: Results are stored as JSON (not JSONL) for better Dataset compatibility
- Overwrite Strategy: Each configuration gets a single file that is overwritten with the latest result
- Error Tracking: Failed benchmarks are also uploaded to track issues
Setup
1. Create a Hugging Face Dataset
- Go to https://huggingface.co/new-dataset
- Create a new dataset (e.g.,
username/transformersjs-benchmark-results) - Keep it public or private based on your needs
2. Get Your HF Token
- Go to https://huggingface.co/settings/tokens
- Create a new token with
writepermissions - Copy the token
3. Configure Environment Variables
Create or update .env file in the bench directory:
# Hugging Face Dataset Configuration
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Optional: Local storage directory
BENCHMARK_RESULTS_DIR=./benchmark-results
# Optional: Server port
PORT=7860
Important: Never commit .env to git. It's already in .gitignore.
Usage
Once configured, the server will automatically upload results:
# Start the server
npm run server
# You should see:
# π€ HF Dataset upload enabled: username/transformersjs-benchmark-results
When benchmarks complete, you'll see:
β
Completed: abc-123 in 5.2s
β Benchmark abc-123 saved to file
β Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json
File Structure in HF Dataset
The dataset will have the same structure as local storage:
feature-extraction/
βββ Xenova/
β βββ all-MiniLM-L6-v2/
β β βββ node_warm_cpu_fp32_b1.json
β β βββ node_warm_webgpu_fp16_b1.json
β β βββ web_warm_wasm_b1_chromium.json
β βββ distilbert-base-uncased/
β βββ node_warm_cpu_fp32_b1.json
text-classification/
βββ Xenova/
βββ distilbert-base-uncased/
βββ node_warm_cpu_fp32_b1.json
JSON Format
Each file contains a single benchmark result (not multiple runs):
{
"id": "abc-123-456",
"platform": "node",
"modelId": "Xenova/all-MiniLM-L6-v2",
"task": "feature-extraction",
"mode": "warm",
"repeats": 3,
"dtype": "fp32",
"batchSize": 1,
"device": "cpu",
"timestamp": 1234567890,
"status": "completed",
"result": {
"metrics": { ... },
"environment": { ... }
}
}
Behavior
Overwriting Results
- Each benchmark configuration maps to a single file
- New results overwrite the existing file
- Only the latest result is kept per configuration
- This ensures the dataset always has current data
Local vs Remote Storage
- Local (JSONL): Keeps history of all runs (append-only)
- Remote (JSON): Keeps only latest result (overwrite)
This dual approach allows:
- Local: Full history for analysis
- Remote: Clean, current results for leaderboards
Failed Benchmarks
Failed benchmarks are also uploaded to track:
- Which models/configs have issues
- Error types (memory errors, etc.)
- Environmental context
Example failed result:
{
"id": "def-456-789",
"status": "failed",
"error": "Benchmark failed with code 1: ...",
"result": {
"error": {
"type": "memory_error",
"message": "Aborted(). Build with -sASSERTIONS for more info.",
"stage": "load"
},
"environment": { ... }
}
}
Git Commits
Each upload creates a git commit in the dataset with:
Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction)
Benchmark ID: abc-123-456
Status: completed
Timestamp: 2025-10-13T06:48:57.481Z
Disabling Upload
To disable HF Dataset upload:
- Remove
HF_TOKENfrom.env, or - Remove both
HF_DATASET_REPOandHF_TOKEN
The server will show:
π€ HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable)
Error Handling
If HF upload fails:
- The error is logged but doesn't fail the benchmark
- Local storage still succeeds
- You can retry manually or fix configuration
Example error:
β Failed to upload benchmark abc-123 to HF Dataset: Authentication failed
API Endpoint (Future)
Currently uploads happen automatically. In the future, we could add:
# Manually trigger upload of a specific result
POST /api/benchmark/:id/upload
# Re-upload all local results to HF Dataset
POST /api/benchmarks/sync
Development vs Production
Use different dataset repositories for development and production:
Development (.env):
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
Production (deployed environment):
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results
This allows testing without polluting production data.