File size: 5,807 Bytes
f6c05e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# Benchmark ID and File Organization

## Overview

Benchmark results are organized using a deterministic ID system that groups results with identical settings into the same file. The ID is structured hierarchically with task at the top level, followed by model ID, and finally encoded parameters.

## Directory Structure

Results are stored in nested directories with task as the top level:

```
benchmark-results/
β”œβ”€β”€ {task}/
β”‚   └── {org}/
β”‚       └── {model-name}/
β”‚           β”œβ”€β”€ {params1}.jsonl
β”‚           β”œβ”€β”€ {params2}.jsonl
β”‚           └── {params3}.jsonl
```

## ID Format

**Full ID**: `{task}/{modelId}/{platform}_{mode}_{device}_{dtype}_{batch}_{browser}_{headed}`

### Components

1. **Task** (top-level directory): The transformers task
   - Examples: `feature-extraction`, `text-classification`, `text-generation`, `sentiment-analysis`
   - Rationale: Tasks are fundamentally different operations, so they form the primary organization

2. **Model ID** (nested directory path): Full model identifier with organization
   - Examples: `Xenova/all-MiniLM-L6-v2`, `meta-llama/Llama-2-7b`
   - Preserved as-is, including slashes for directory structure

3. **Platform**: `node` or `web`

4. **Mode**: `warm` or `cold`

5. **Device**: Execution device
   - Node.js: `cpu` (default), `webgpu`
   - Web: `webgpu` (default), `wasm`

6. **DType** (optional): Model data type
   - Examples: `fp32`, `fp16`, `q8`, `q4`, `int8`
   - Omitted if not specified

7. **Batch Size**: Always included as `b{N}`
   - Examples: `b1`, `b4`, `b32`

8. **Browser** (web only): Browser type
   - Examples: `chromium`, `firefox`, `webkit`
   - Omitted for Node.js benchmarks

9. **Headed** (web only): Display mode
   - Included as `headed` only if true
   - Omitted for headless mode or Node.js benchmarks

## Examples

### Node.js Benchmarks

```
feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl
feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_webgpu_fp16_b4.jsonl
feature-extraction/Xenova/all-MiniLM-L6-v2/node_cold_cpu_b1.jsonl
text-generation/meta-llama/Llama-2-7b/node_warm_cpu_q4_b1.jsonl
```

### Web Benchmarks

```
feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_b1_chromium.jsonl
feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_q8_b1_firefox.jsonl
feature-extraction/Xenova/distilbert-base-uncased/web_warm_webgpu_fp16_b1_chromium_headed.jsonl
feature-extraction/Xenova/roberta-large-mnli/web_cold_wasm_b1_chromium.jsonl
```

### Mixed Tasks and Models

```
benchmark-results/
β”œβ”€β”€ feature-extraction/
β”‚   └── Xenova/
β”‚       β”œβ”€β”€ all-MiniLM-L6-v2/
β”‚       β”‚   β”œβ”€β”€ node_warm_cpu_fp32_b1.jsonl
β”‚       β”‚   └── web_warm_wasm_b1_chromium.jsonl
β”‚       └── distilbert-base-uncased/
β”‚           └── node_warm_webgpu_fp16_b1.jsonl
└── text-classification/
    └── Xenova/
        └── distilbert-base-uncased/
            └── node_warm_cpu_fp32_b1.jsonl
```

## File Format

Each file is in JSONL (JSON Lines) format, with one benchmark result per line. This allows:
- Appending new results without parsing the entire file
- Streaming large result sets
- Easy analysis with tools like `jq`

Example:
```jsonl
{"id":"uuid1","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
{"id":"uuid2","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
{"id":"uuid3","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
```

## Querying Results

### API Endpoints

1. **Get all results**:
   ```bash
   curl http://localhost:7860/api/benchmarks
   ```

2. **Get results by model**:
   ```bash
   curl "http://localhost:7860/api/benchmarks?modelId=Xenova/all-MiniLM-L6-v2"
   ```

3. **Get specific benchmark**:
   ```bash
   curl http://localhost:7860/api/benchmark/{uuid}
   ```

### Direct File Access

Results can also be queried directly from the filesystem:

```bash
# All results for a specific task
cat benchmark-results/feature-extraction/**/*.jsonl | jq

# All results for a specific model across all tasks
cat benchmark-results/*/Xenova/all-MiniLM-L6-v2/*.jsonl | jq

# All results for a specific model and task
cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl | jq

# Specific configuration
cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl | jq

# Count results per configuration
wc -l benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl

# Filter by device across all models
cat benchmark-results/feature-extraction/*/*/web_*_wasm_*.jsonl | jq

# Compare same model across different tasks
cat benchmark-results/*/Xenova/distilbert-base-uncased/node_warm_cpu_fp32_b1.jsonl | jq
```

## Benefits

1. **Task-First Organization**: Primary organization by task type, as models are typically designed for specific tasks
2. **Grouping**: Multiple runs with identical settings are stored together in JSONL files
3. **Easy Comparison**: Compare different models on the same task, or same model across different tasks
4. **Organization**: Clear hierarchy: task β†’ org β†’ model β†’ configurations
5. **Readability**: Filenames are human-readable and self-documenting
6. **Searchability**: Easy to find specific configurations using filesystem tools and glob patterns
7. **Scalability**: Nested directory structure handles thousands of models and tasks
8. **Model ID Preservation**: Full model IDs maintained without sanitization, preserving org/model structure

## Configuration

The base directory can be customized via environment variable:

```bash
export BENCHMARK_RESULTS_DIR=/path/to/results
npm run server
```

Default: `./benchmark-results`