Spaces:

whitphx
/

transformersjs-performance-leaderboard-backend

Runtime error

App Files Files Community

transformersjs-performance-leaderboard-backend / bench /docs /hf-dataset-integration.md

whitphx HF Staff

upload result to HF dataset

7cb11f5 29 days ago

preview code

raw

history blame

5.24 kB

Hugging Face Dataset Integration

The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing.

Features

Automatic Upload: Results are automatically pushed to HF Dataset when benchmarks complete
File Structure Preservation: Uses the same path structure: {task}/{org}/{model}/{params}.json
JSON Format: Results are stored as JSON (not JSONL) for better Dataset compatibility
Overwrite Strategy: Each configuration gets a single file that is overwritten with the latest result
Error Tracking: Failed benchmarks are also uploaded to track issues

Setup

1. Create a Hugging Face Dataset

Go to https://huggingface.co/new-dataset
Create a new dataset (e.g., username/transformersjs-benchmark-results)
Keep it public or private based on your needs

2. Get Your HF Token

Go to https://huggingface.co/settings/tokens
Create a new token with write permissions
Copy the token

3. Configure Environment Variables

Create or update .env file in the bench directory:

# Hugging Face Dataset Configuration
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Optional: Local storage directory
BENCHMARK_RESULTS_DIR=./benchmark-results

# Optional: Server port
PORT=7860

Important: Never commit .env to git. It's already in .gitignore.

Usage

Once configured, the server will automatically upload results:

# Start the server
npm run server

# You should see:
# 📤 HF Dataset upload enabled: username/transformersjs-benchmark-results

When benchmarks complete, you'll see:

✅ Completed: abc-123 in 5.2s
✓ Benchmark abc-123 saved to file
✓ Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json

File Structure in HF Dataset

The dataset will have the same structure as local storage:

feature-extraction/
├── Xenova/
│   ├── all-MiniLM-L6-v2/
│   │   ├── node_warm_cpu_fp32_b1.json
│   │   ├── node_warm_webgpu_fp16_b1.json
│   │   └── web_warm_wasm_b1_chromium.json
│   └── distilbert-base-uncased/
│       └── node_warm_cpu_fp32_b1.json
text-classification/
└── Xenova/
    └── distilbert-base-uncased/
        └── node_warm_cpu_fp32_b1.json

JSON Format

Each file contains a single benchmark result (not multiple runs):

{
  "id": "abc-123-456",
  "platform": "node",
  "modelId": "Xenova/all-MiniLM-L6-v2",
  "task": "feature-extraction",
  "mode": "warm",
  "repeats": 3,
  "dtype": "fp32",
  "batchSize": 1,
  "device": "cpu",
  "timestamp": 1234567890,
  "status": "completed",
  "result": {
    "metrics": { ... },
    "environment": { ... }
  }
}

Behavior

Overwriting Results

Each benchmark configuration maps to a single file
New results overwrite the existing file
Only the latest result is kept per configuration
This ensures the dataset always has current data

Local vs Remote Storage

Local (JSONL): Keeps history of all runs (append-only)
Remote (JSON): Keeps only latest result (overwrite)

This dual approach allows:

Local: Full history for analysis
Remote: Clean, current results for leaderboards

Failed Benchmarks

Failed benchmarks are also uploaded to track:

Which models/configs have issues
Error types (memory errors, etc.)
Environmental context

Example failed result:

{
  "id": "def-456-789",
  "status": "failed",
  "error": "Benchmark failed with code 1: ...",
  "result": {
    "error": {
      "type": "memory_error",
      "message": "Aborted(). Build with -sASSERTIONS for more info.",
      "stage": "load"
    },
    "environment": { ... }
  }
}

Git Commits

Each upload creates a git commit in the dataset with:

Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction)

Benchmark ID: abc-123-456
Status: completed
Timestamp: 2025-10-13T06:48:57.481Z

Disabling Upload

To disable HF Dataset upload:

Remove HF_TOKEN from .env, or
Remove both HF_DATASET_REPO and HF_TOKEN

The server will show:

📤 HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable)

Error Handling

If HF upload fails:

The error is logged but doesn't fail the benchmark
Local storage still succeeds
You can retry manually or fix configuration

Example error:

✗ Failed to upload benchmark abc-123 to HF Dataset: Authentication failed

API Endpoint (Future)

Currently uploads happen automatically. In the future, we could add:

# Manually trigger upload of a specific result
POST /api/benchmark/:id/upload

# Re-upload all local results to HF Dataset
POST /api/benchmarks/sync

Development vs Production

Use different dataset repositories for development and production:

Development (.env):

HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev

Production (deployed environment):

HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results

This allows testing without polluting production data.