whitphx's picture
whitphx HF Staff
upload result to HF dataset
7cb11f5
|
raw
history blame
5.24 kB

Hugging Face Dataset Integration

The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing.

Features

  • Automatic Upload: Results are automatically pushed to HF Dataset when benchmarks complete
  • File Structure Preservation: Uses the same path structure: {task}/{org}/{model}/{params}.json
  • JSON Format: Results are stored as JSON (not JSONL) for better Dataset compatibility
  • Overwrite Strategy: Each configuration gets a single file that is overwritten with the latest result
  • Error Tracking: Failed benchmarks are also uploaded to track issues

Setup

1. Create a Hugging Face Dataset

  1. Go to https://huggingface.co/new-dataset
  2. Create a new dataset (e.g., username/transformersjs-benchmark-results)
  3. Keep it public or private based on your needs

2. Get Your HF Token

  1. Go to https://huggingface.co/settings/tokens
  2. Create a new token with write permissions
  3. Copy the token

3. Configure Environment Variables

Create or update .env file in the bench directory:

# Hugging Face Dataset Configuration
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Optional: Local storage directory
BENCHMARK_RESULTS_DIR=./benchmark-results

# Optional: Server port
PORT=7860

Important: Never commit .env to git. It's already in .gitignore.

Usage

Once configured, the server will automatically upload results:

# Start the server
npm run server

# You should see:
# πŸ“€ HF Dataset upload enabled: username/transformersjs-benchmark-results

When benchmarks complete, you'll see:

βœ… Completed: abc-123 in 5.2s
βœ“ Benchmark abc-123 saved to file
βœ“ Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json

File Structure in HF Dataset

The dataset will have the same structure as local storage:

feature-extraction/
β”œβ”€β”€ Xenova/
β”‚   β”œβ”€β”€ all-MiniLM-L6-v2/
β”‚   β”‚   β”œβ”€β”€ node_warm_cpu_fp32_b1.json
β”‚   β”‚   β”œβ”€β”€ node_warm_webgpu_fp16_b1.json
β”‚   β”‚   └── web_warm_wasm_b1_chromium.json
β”‚   └── distilbert-base-uncased/
β”‚       └── node_warm_cpu_fp32_b1.json
text-classification/
└── Xenova/
    └── distilbert-base-uncased/
        └── node_warm_cpu_fp32_b1.json

JSON Format

Each file contains a single benchmark result (not multiple runs):

{
  "id": "abc-123-456",
  "platform": "node",
  "modelId": "Xenova/all-MiniLM-L6-v2",
  "task": "feature-extraction",
  "mode": "warm",
  "repeats": 3,
  "dtype": "fp32",
  "batchSize": 1,
  "device": "cpu",
  "timestamp": 1234567890,
  "status": "completed",
  "result": {
    "metrics": { ... },
    "environment": { ... }
  }
}

Behavior

Overwriting Results

  • Each benchmark configuration maps to a single file
  • New results overwrite the existing file
  • Only the latest result is kept per configuration
  • This ensures the dataset always has current data

Local vs Remote Storage

  • Local (JSONL): Keeps history of all runs (append-only)
  • Remote (JSON): Keeps only latest result (overwrite)

This dual approach allows:

  • Local: Full history for analysis
  • Remote: Clean, current results for leaderboards

Failed Benchmarks

Failed benchmarks are also uploaded to track:

  • Which models/configs have issues
  • Error types (memory errors, etc.)
  • Environmental context

Example failed result:

{
  "id": "def-456-789",
  "status": "failed",
  "error": "Benchmark failed with code 1: ...",
  "result": {
    "error": {
      "type": "memory_error",
      "message": "Aborted(). Build with -sASSERTIONS for more info.",
      "stage": "load"
    },
    "environment": { ... }
  }
}

Git Commits

Each upload creates a git commit in the dataset with:

Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction)

Benchmark ID: abc-123-456
Status: completed
Timestamp: 2025-10-13T06:48:57.481Z

Disabling Upload

To disable HF Dataset upload:

  1. Remove HF_TOKEN from .env, or
  2. Remove both HF_DATASET_REPO and HF_TOKEN

The server will show:

πŸ“€ HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable)

Error Handling

If HF upload fails:

  • The error is logged but doesn't fail the benchmark
  • Local storage still succeeds
  • You can retry manually or fix configuration

Example error:

βœ— Failed to upload benchmark abc-123 to HF Dataset: Authentication failed

API Endpoint (Future)

Currently uploads happen automatically. In the future, we could add:

# Manually trigger upload of a specific result
POST /api/benchmark/:id/upload

# Re-upload all local results to HF Dataset
POST /api/benchmarks/sync

Development vs Production

Use different dataset repositories for development and production:

Development (.env):

HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev

Production (deployed environment):

HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results

This allows testing without polluting production data.