# Hugging Face Dataset Integration The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing. ## Features - **Automatic Upload**: Results are automatically pushed to HF Dataset when benchmarks complete - **File Structure Preservation**: Uses the same path structure: `{task}/{org}/{model}/{params}.json` - **JSON Format**: Results are stored as JSON (not JSONL) for better Dataset compatibility - **Overwrite Strategy**: Each configuration gets a single file that is overwritten with the latest result - **Error Tracking**: Failed benchmarks are also uploaded to track issues ## Setup ### 1. Create a Hugging Face Dataset 1. Go to https://huggingface.co/new-dataset 2. Create a new dataset (e.g., `username/transformersjs-benchmark-results`) 3. Keep it public or private based on your needs ### 2. Get Your HF Token 1. Go to https://huggingface.co/settings/tokens 2. Create a new token with `write` permissions 3. Copy the token ### 3. Configure Environment Variables Create or update `.env` file in the `bench` directory: ```bash # Hugging Face Dataset Configuration HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Optional: Local storage directory BENCHMARK_RESULTS_DIR=./benchmark-results # Optional: Server port PORT=7860 ``` **Important**: Never commit `.env` to git. It's already in `.gitignore`. ## Usage Once configured, the server will automatically upload results: ```bash # Start the server npm run server # You should see: # 📤 HF Dataset upload enabled: username/transformersjs-benchmark-results ``` When benchmarks complete, you'll see: ``` ✅ Completed: abc-123 in 5.2s ✓ Benchmark abc-123 saved to file ✓ Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json ``` ## File Structure in HF Dataset The dataset will have the same structure as local storage: ``` feature-extraction/ ├── Xenova/ │ ├── all-MiniLM-L6-v2/ │ │ ├── node_warm_cpu_fp32_b1.json │ │ ├── node_warm_webgpu_fp16_b1.json │ │ └── web_warm_wasm_b1_chromium.json │ └── distilbert-base-uncased/ │ └── node_warm_cpu_fp32_b1.json text-classification/ └── Xenova/ └── distilbert-base-uncased/ └── node_warm_cpu_fp32_b1.json ``` ## JSON Format Each file contains a single benchmark result (not multiple runs): ```json { "id": "abc-123-456", "platform": "node", "modelId": "Xenova/all-MiniLM-L6-v2", "task": "feature-extraction", "mode": "warm", "repeats": 3, "dtype": "fp32", "batchSize": 1, "device": "cpu", "timestamp": 1234567890, "status": "completed", "result": { "metrics": { ... }, "environment": { ... } } } ``` ## Behavior ### Overwriting Results - Each benchmark configuration maps to a single file - New results **overwrite** the existing file - Only the **latest** result is kept per configuration - This ensures the dataset always has current data ### Local vs Remote Storage - **Local (JSONL)**: Keeps history of all runs (append-only) - **Remote (JSON)**: Keeps only latest result (overwrite) This dual approach allows: - Local: Full history for analysis - Remote: Clean, current results for leaderboards ### Failed Benchmarks Failed benchmarks are also uploaded to track: - Which models/configs have issues - Error types (memory errors, etc.) - Environmental context Example failed result: ```json { "id": "def-456-789", "status": "failed", "error": "Benchmark failed with code 1: ...", "result": { "error": { "type": "memory_error", "message": "Aborted(). Build with -sASSERTIONS for more info.", "stage": "load" }, "environment": { ... } } } ``` ## Git Commits Each upload creates a git commit in the dataset with: ``` Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction) Benchmark ID: abc-123-456 Status: completed Timestamp: 2025-10-13T06:48:57.481Z ``` ## Disabling Upload To disable HF Dataset upload: 1. Remove `HF_TOKEN` from `.env`, or 2. Remove both `HF_DATASET_REPO` and `HF_TOKEN` The server will show: ``` 📤 HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable) ``` ## Error Handling If HF upload fails: - The error is logged but doesn't fail the benchmark - Local storage still succeeds - You can retry manually or fix configuration Example error: ``` ✗ Failed to upload benchmark abc-123 to HF Dataset: Authentication failed ``` ## API Endpoint (Future) Currently uploads happen automatically. In the future, we could add: ```bash # Manually trigger upload of a specific result POST /api/benchmark/:id/upload # Re-upload all local results to HF Dataset POST /api/benchmarks/sync ``` ## Development vs Production Use different dataset repositories for development and production: **Development** (`.env`): ```bash HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev ``` **Production** (deployed environment): ```bash HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results ``` This allows testing without polluting production data.