File size: 5,243 Bytes
7cb11f5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# Hugging Face Dataset Integration

The benchmark server can automatically upload results to a Hugging Face Dataset repository for centralized storage and sharing.

## Features

- **Automatic Upload**: Results are automatically pushed to HF Dataset when benchmarks complete
- **File Structure Preservation**: Uses the same path structure: `{task}/{org}/{model}/{params}.json`
- **JSON Format**: Results are stored as JSON (not JSONL) for better Dataset compatibility
- **Overwrite Strategy**: Each configuration gets a single file that is overwritten with the latest result
- **Error Tracking**: Failed benchmarks are also uploaded to track issues

## Setup

### 1. Create a Hugging Face Dataset

1. Go to https://huggingface.co/new-dataset
2. Create a new dataset (e.g., `username/transformersjs-benchmark-results`)
3. Keep it public or private based on your needs

### 2. Get Your HF Token

1. Go to https://huggingface.co/settings/tokens
2. Create a new token with `write` permissions
3. Copy the token

### 3. Configure Environment Variables

Create or update `.env` file in the `bench` directory:

```bash
# Hugging Face Dataset Configuration
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Optional: Local storage directory
BENCHMARK_RESULTS_DIR=./benchmark-results

# Optional: Server port
PORT=7860
```

**Important**: Never commit `.env` to git. It's already in `.gitignore`.

## Usage

Once configured, the server will automatically upload results:

```bash
# Start the server
npm run server

# You should see:
# πŸ“€ HF Dataset upload enabled: username/transformersjs-benchmark-results
```

When benchmarks complete, you'll see:

```
βœ… Completed: abc-123 in 5.2s
βœ“ Benchmark abc-123 saved to file
βœ“ Uploaded to HF Dataset: feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.json
```

## File Structure in HF Dataset

The dataset will have the same structure as local storage:

```
feature-extraction/
β”œβ”€β”€ Xenova/
β”‚   β”œβ”€β”€ all-MiniLM-L6-v2/
β”‚   β”‚   β”œβ”€β”€ node_warm_cpu_fp32_b1.json
β”‚   β”‚   β”œβ”€β”€ node_warm_webgpu_fp16_b1.json
β”‚   β”‚   └── web_warm_wasm_b1_chromium.json
β”‚   └── distilbert-base-uncased/
β”‚       └── node_warm_cpu_fp32_b1.json
text-classification/
└── Xenova/
    └── distilbert-base-uncased/
        └── node_warm_cpu_fp32_b1.json
```

## JSON Format

Each file contains a single benchmark result (not multiple runs):

```json
{
  "id": "abc-123-456",
  "platform": "node",
  "modelId": "Xenova/all-MiniLM-L6-v2",
  "task": "feature-extraction",
  "mode": "warm",
  "repeats": 3,
  "dtype": "fp32",
  "batchSize": 1,
  "device": "cpu",
  "timestamp": 1234567890,
  "status": "completed",
  "result": {
    "metrics": { ... },
    "environment": { ... }
  }
}
```

## Behavior

### Overwriting Results

- Each benchmark configuration maps to a single file
- New results **overwrite** the existing file
- Only the **latest** result is kept per configuration
- This ensures the dataset always has current data

### Local vs Remote Storage

- **Local (JSONL)**: Keeps history of all runs (append-only)
- **Remote (JSON)**: Keeps only latest result (overwrite)

This dual approach allows:
- Local: Full history for analysis
- Remote: Clean, current results for leaderboards

### Failed Benchmarks

Failed benchmarks are also uploaded to track:
- Which models/configs have issues
- Error types (memory errors, etc.)
- Environmental context

Example failed result:

```json
{
  "id": "def-456-789",
  "status": "failed",
  "error": "Benchmark failed with code 1: ...",
  "result": {
    "error": {
      "type": "memory_error",
      "message": "Aborted(). Build with -sASSERTIONS for more info.",
      "stage": "load"
    },
    "environment": { ... }
  }
}
```

## Git Commits

Each upload creates a git commit in the dataset with:

```
Update benchmark: Xenova/all-MiniLM-L6-v2 (node/feature-extraction)

Benchmark ID: abc-123-456
Status: completed
Timestamp: 2025-10-13T06:48:57.481Z
```

## Disabling Upload

To disable HF Dataset upload:

1. Remove `HF_TOKEN` from `.env`, or
2. Remove both `HF_DATASET_REPO` and `HF_TOKEN`

The server will show:

```
πŸ“€ HF Dataset upload disabled (set HF_DATASET_REPO and HF_TOKEN to enable)
```

## Error Handling

If HF upload fails:
- The error is logged but doesn't fail the benchmark
- Local storage still succeeds
- You can retry manually or fix configuration

Example error:

```
βœ— Failed to upload benchmark abc-123 to HF Dataset: Authentication failed
```

## API Endpoint (Future)

Currently uploads happen automatically. In the future, we could add:

```bash
# Manually trigger upload of a specific result
POST /api/benchmark/:id/upload

# Re-upload all local results to HF Dataset
POST /api/benchmarks/sync
```

## Development vs Production

Use different dataset repositories for development and production:

**Development** (`.env`):
```bash
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results-dev
```

**Production** (deployed environment):
```bash
HF_DATASET_REPO=whitphx/transformersjs-performance-leaderboard-results
```

This allows testing without polluting production data.