whitphx HF Staff commited on
Commit
b59a5d0
·
0 Parent(s):

Move leaderboard app from bench repo

Browse files
.env.example ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # HuggingFace Dataset Repository
2
+ # The dataset repository where benchmark results are stored
3
+ HF_DATASET_REPO=your-username/your-dataset-repo
4
+
5
+ # HuggingFace API Token (optional, for private datasets)
6
+ HF_TOKEN=your_token_here
.gitignore ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual Environment
24
+ .venv/
25
+ venv/
26
+ ENV/
27
+ env/
28
+
29
+ # Environment variables
30
+ .env
31
+
32
+ # IDE
33
+ .vscode/
34
+ .idea/
35
+ *.swp
36
+ *.swo
37
+ *~
38
+
39
+ # OS
40
+ .DS_Store
41
+ Thumbs.db
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.13
README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Transformers.js Benchmark Leaderboard
3
+ emoji: 🏆
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: src/leaderboard/app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Transformers.js Benchmark Leaderboard
13
+
14
+ A Gradio-based leaderboard that displays benchmark results from a HuggingFace Dataset repository.
15
+
16
+ ## Features
17
+
18
+ - 📊 Display benchmark results in a searchable/filterable table
19
+ - 🔍 Filter by model name, task, platform, device, mode, and dtype
20
+ - 🔄 Refresh data on demand from HuggingFace Dataset
21
+ - 📈 View performance metrics (load time, inference time, p50/p90 percentiles)
22
+
23
+ ## Setup
24
+
25
+ 1. Install dependencies:
26
+ ```bash
27
+ uv sync
28
+ ```
29
+
30
+ 2. Configure environment variables:
31
+ ```bash
32
+ cp .env.example .env
33
+ ```
34
+
35
+ Edit `.env` and set:
36
+ - `HF_DATASET_REPO`: Your HuggingFace dataset repository (e.g., `username/transformersjs-benchmarks`)
37
+ - `HF_TOKEN`: Your HuggingFace API token (optional, for private datasets)
38
+
39
+ ## Usage
40
+
41
+ Run the leaderboard:
42
+
43
+ ```bash
44
+ uv run python -m leaderboard.app
45
+ ```
46
+
47
+ Or using the installed script:
48
+
49
+ ```bash
50
+ uv run leaderboard
51
+ ```
52
+
53
+ The leaderboard will be available at: http://localhost:7861
54
+
55
+ ## Data Format
56
+
57
+ The leaderboard reads JSONL files from the HuggingFace Dataset repository. Each line should be a JSON object with the following structure:
58
+
59
+ ```json
60
+ {
61
+ "id": "benchmark-id",
62
+ "platform": "web",
63
+ "modelId": "Xenova/all-MiniLM-L6-v2",
64
+ "task": "feature-extraction",
65
+ "mode": "warm",
66
+ "repeats": 3,
67
+ "batchSize": 1,
68
+ "device": "wasm",
69
+ "browser": "chromium",
70
+ "dtype": "fp32",
71
+ "headed": false,
72
+ "status": "completed",
73
+ "timestamp": 1234567890,
74
+ "result": {
75
+ "metrics": {
76
+ "load_ms": {"p50": 100, "p90": 120},
77
+ "first_infer_ms": {"p50": 10, "p90": 15},
78
+ "subsequent_infer_ms": {"p50": 8, "p90": 12}
79
+ },
80
+ "environment": {
81
+ "cpuCores": 10,
82
+ "memory": {"deviceMemory": 8}
83
+ }
84
+ }
85
+ }
86
+ ```
87
+
88
+ ## Deployment on Hugging Face Spaces
89
+
90
+ This leaderboard is designed to be deployed on [Hugging Face Spaces](https://huggingface.co/spaces) using the Gradio SDK.
91
+
92
+ ### Quick Deploy
93
+
94
+ 1. **Create a new Space** on Hugging Face:
95
+ - Go to https://huggingface.co/new-space
96
+ - Choose **Gradio** as the SDK
97
+ - Set the Space name (e.g., `transformersjs-benchmark-leaderboard`)
98
+
99
+ 2. **Upload files to your Space**:
100
+ ```bash
101
+ # Clone your Space repository
102
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
103
+ cd YOUR_SPACE_NAME
104
+
105
+ # Copy leaderboard files
106
+ cp -r /path/to/leaderboard/* .
107
+
108
+ # Commit and push
109
+ git add .
110
+ git commit -m "Initial leaderboard deployment"
111
+ git push
112
+ ```
113
+
114
+ 3. **Configure Space secrets**:
115
+ - Go to your Space settings → **Variables and secrets**
116
+ - Add the following secrets:
117
+ - `HF_DATASET_REPO`: Your dataset repository (e.g., `username/benchmark-results`)
118
+ - `HF_TOKEN`: Your HuggingFace API token (for private datasets)
119
+
120
+ 4. **Space will automatically deploy** and be available at:
121
+ ```
122
+ https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
123
+ ```
124
+
125
+ ### Space Configuration
126
+
127
+ The Space is configured via the YAML frontmatter in `README.md`:
128
+
129
+ ```yaml
130
+ ---
131
+ title: Transformers.js Benchmark Leaderboard
132
+ emoji: 🏆
133
+ colorFrom: blue
134
+ colorTo: purple
135
+ sdk: gradio
136
+ sdk_version: 5.49.1
137
+ app_file: src/leaderboard/app.py
138
+ pinned: false
139
+ ---
140
+ ```
141
+
142
+ **Key configuration options:**
143
+ - `sdk`: Must be `gradio` for Gradio apps
144
+ - `sdk_version`: Gradio version (matches your `pyproject.toml`)
145
+ - `app_file`: Path to the main Python file (relative to repository root)
146
+ - `pinned`: Set to `true` to pin the Space on your profile
147
+
148
+ ### Requirements
149
+
150
+ The Space will automatically install dependencies from `pyproject.toml`:
151
+ - `gradio>=5.9.1`
152
+ - `pandas`
153
+ - `huggingface-hub`
154
+ - `python-dotenv`
155
+
156
+ ### Environment Variables
157
+
158
+ Set these in your Space settings or in a `.env` file (not recommended for production):
159
+
160
+ | Variable | Required | Description |
161
+ |----------|----------|-------------|
162
+ | `HF_DATASET_REPO` | Yes | HuggingFace dataset repository containing benchmark results |
163
+ | `HF_TOKEN` | No | HuggingFace API token (only for private datasets) |
164
+
165
+ ### Auto-Restart
166
+
167
+ Spaces automatically restart when:
168
+ - Code is pushed to the repository
169
+ - Dependencies are updated
170
+ - Environment variables are changed
171
+
172
+ ### Monitoring
173
+
174
+ - View logs in the Space's **Logs** tab
175
+ - Check status in the **Settings** tab
176
+ - Monitor resource usage (CPU, memory)
177
+
178
+ ## Development
179
+
180
+ The leaderboard is built with:
181
+ - **Gradio**: Web UI framework
182
+ - **Pandas**: Data manipulation
183
+ - **HuggingFace Hub**: Dataset loading
184
+
185
+ ### Local Development
186
+
187
+ 1. Install dependencies:
188
+ ```bash
189
+ uv sync
190
+ ```
191
+
192
+ 2. Set environment variables:
193
+ ```bash
194
+ export HF_DATASET_REPO="your-username/benchmark-results"
195
+ export HF_TOKEN="your-hf-token" # Optional
196
+ ```
197
+
198
+ 3. Run locally:
199
+ ```bash
200
+ uv run python -m leaderboard.app
201
+ ```
202
+
203
+ 4. Access at: http://localhost:7861
main.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ def main():
2
+ print("Hello from leaderboard!")
3
+
4
+
5
+ if __name__ == "__main__":
6
+ main()
pyproject.toml ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "leaderboard"
3
+ version = "0.1.0"
4
+ description = "Transformers.js Benchmark Leaderboard - Display benchmark results from HuggingFace Dataset"
5
+ requires-python = ">=3.13"
6
+ dependencies = [
7
+ "gradio>=5.49.1",
8
+ "huggingface-hub>=0.35.3",
9
+ "pandas>=2.3.3",
10
+ "python-dotenv>=1.1.1",
11
+ ]
12
+
13
+ [project.scripts]
14
+ leaderboard = "leaderboard.app:create_leaderboard_ui"
15
+
16
+ [build-system]
17
+ requires = ["hatchling"]
18
+ build-backend = "hatchling.build"
19
+
20
+ [tool.hatch.build.targets.wheel]
21
+ packages = ["src/leaderboard"]
22
+
23
+ [tool.uv]
24
+ package = true
src/leaderboard/__init__.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Transformers.js Benchmark Leaderboard"""
2
+
3
+ from .app import create_leaderboard_ui
4
+ from .data_loader import load_benchmark_data, get_unique_values, flatten_result, get_first_timer_friendly_models
5
+ from .formatters import apply_formatting
6
+
7
+ __version__ = "0.1.0"
8
+ __all__ = [
9
+ "create_leaderboard_ui",
10
+ "load_benchmark_data",
11
+ "get_unique_values",
12
+ "flatten_result",
13
+ "get_first_timer_friendly_models",
14
+ "apply_formatting",
15
+ ]
src/leaderboard/app.py ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Transformers.js Benchmark Leaderboard
3
+
4
+ A Gradio app that displays benchmark results from a HuggingFace Dataset repository.
5
+ """
6
+
7
+ import os
8
+ import logging
9
+ import pandas as pd
10
+ import gradio as gr
11
+ from dotenv import load_dotenv
12
+
13
+ from leaderboard.data_loader import (
14
+ load_benchmark_data,
15
+ get_unique_values,
16
+ get_webgpu_beginner_friendly_models,
17
+ format_recommended_models_as_markdown,
18
+ )
19
+ from leaderboard.formatters import apply_formatting
20
+
21
+ # Configure logging
22
+ logging.basicConfig(
23
+ level=logging.INFO,
24
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
25
+ datefmt='%Y-%m-%d %H:%M:%S'
26
+ )
27
+
28
+ # Load environment variables
29
+ load_dotenv()
30
+
31
+ HF_DATASET_REPO = os.getenv("HF_DATASET_REPO")
32
+ HF_TOKEN = os.getenv("HF_TOKEN")
33
+
34
+
35
+ def load_data() -> pd.DataFrame:
36
+ """Load benchmark data from configured HF Dataset repository."""
37
+ # Load raw data
38
+ df = load_benchmark_data(
39
+ dataset_repo=HF_DATASET_REPO,
40
+ token=HF_TOKEN,
41
+ )
42
+
43
+ return df
44
+
45
+
46
+ def format_dataframe(df: pd.DataFrame) -> pd.DataFrame:
47
+ """Apply formatting to dataframe for display."""
48
+ if df.empty:
49
+ return df
50
+
51
+ return df.apply(lambda row: pd.Series(apply_formatting(row.to_dict())), axis=1)
52
+
53
+
54
+ def filter_data(
55
+ df: pd.DataFrame,
56
+ model_filter: str,
57
+ task_filter: str,
58
+ platform_filter: str,
59
+ device_filter: str,
60
+ mode_filter: str,
61
+ dtype_filter: str,
62
+ status_filter: str,
63
+ ) -> pd.DataFrame:
64
+ """Filter benchmark data based on user inputs."""
65
+ if df.empty:
66
+ return df
67
+
68
+ filtered = df.copy()
69
+
70
+ # Model name filter
71
+ if model_filter:
72
+ filtered = filtered[
73
+ filtered["modelId"].str.contains(model_filter, case=False, na=False)
74
+ ]
75
+
76
+ # Task filter
77
+ if task_filter and task_filter != "All":
78
+ filtered = filtered[filtered["task"] == task_filter]
79
+
80
+ # Platform filter
81
+ if platform_filter and platform_filter != "All":
82
+ filtered = filtered[filtered["platform"] == platform_filter]
83
+
84
+ # Device filter
85
+ if device_filter and device_filter != "All":
86
+ filtered = filtered[filtered["device"] == device_filter]
87
+
88
+ # Mode filter
89
+ if mode_filter and mode_filter != "All":
90
+ filtered = filtered[filtered["mode"] == mode_filter]
91
+
92
+ # DType filter
93
+ if dtype_filter and dtype_filter != "All":
94
+ filtered = filtered[filtered["dtype"] == dtype_filter]
95
+
96
+ # Status filter
97
+ if status_filter and status_filter != "All":
98
+ filtered = filtered[filtered["status"] == status_filter]
99
+
100
+ return filtered
101
+
102
+
103
+ def create_leaderboard_ui():
104
+ """Create the Gradio UI for the leaderboard."""
105
+
106
+ # Load initial data
107
+ df = load_data()
108
+ formatted_df = format_dataframe(df)
109
+
110
+ with gr.Blocks(title="Transformers.js Benchmark Leaderboard") as demo:
111
+ # Cache raw data in Gradio state to avoid reloading on every filter change
112
+ raw_data_state = gr.State(df)
113
+ gr.Markdown("# 🏆 Transformers.js Benchmark Leaderboard")
114
+ gr.Markdown(
115
+ "Compare benchmark results for different models, platforms, and configurations."
116
+ )
117
+
118
+ if not HF_DATASET_REPO:
119
+ gr.Markdown(
120
+ "⚠️ **HF_DATASET_REPO not configured.** "
121
+ "Please set the environment variable to load benchmark data."
122
+ )
123
+
124
+ gr.Markdown(
125
+ "💡 **Tip:** Use the recommended models section below to find popular models "
126
+ "that are fast to load and quick to run - perfect for getting started!"
127
+ )
128
+
129
+ # Recommended models section
130
+ gr.Markdown("## ⭐ Recommended WebGPU Models for Beginners")
131
+ gr.Markdown(
132
+ "These models are selected for being:\n"
133
+ "- **WebGPU compatible** - Work in modern browsers with GPU acceleration\n"
134
+ "- **Beginner-friendly** - Popular, fast to load, and quick to run\n"
135
+ "- Sorted by task type, showing top 3-5 models per task"
136
+ )
137
+
138
+ # Get recommended models
139
+ recommended_models = get_webgpu_beginner_friendly_models(df, limit_per_task=5)
140
+ formatted_recommended = format_dataframe(recommended_models)
141
+ markdown_output = format_recommended_models_as_markdown(recommended_models)
142
+
143
+ recommended_table = gr.DataFrame(
144
+ value=formatted_recommended,
145
+ label="Top WebGPU-Compatible Models by Task",
146
+ interactive=False,
147
+ wrap=True,
148
+ )
149
+
150
+ gr.Markdown("### 📝 Markdown Output for llms.txt")
151
+ gr.Markdown(
152
+ "Copy the markdown below to embed in your llms.txt or documentation:"
153
+ )
154
+
155
+ markdown_textbox = gr.Textbox(
156
+ value=markdown_output,
157
+ label="Markdown for llms.txt",
158
+ lines=20,
159
+ max_lines=30,
160
+ show_copy_button=True,
161
+ interactive=False,
162
+ )
163
+
164
+ gr.Markdown("---")
165
+ gr.Markdown("## 🔍 Full Benchmark Results")
166
+
167
+ with gr.Row():
168
+ refresh_btn = gr.Button("🔄 Refresh Data", variant="primary")
169
+
170
+ with gr.Row():
171
+ model_filter = gr.Textbox(
172
+ label="Model Name",
173
+ placeholder="Filter by model name (e.g., 'bert', 'gpt')",
174
+ )
175
+ task_filter = gr.Dropdown(
176
+ label="Task",
177
+ choices=get_unique_values(df, "task"),
178
+ value="All",
179
+ )
180
+
181
+ with gr.Row():
182
+ platform_filter = gr.Dropdown(
183
+ label="Platform",
184
+ choices=get_unique_values(df, "platform"),
185
+ value="All",
186
+ )
187
+ device_filter = gr.Dropdown(
188
+ label="Device",
189
+ choices=get_unique_values(df, "device"),
190
+ value="All",
191
+ )
192
+
193
+ with gr.Row():
194
+ mode_filter = gr.Dropdown(
195
+ label="Mode",
196
+ choices=get_unique_values(df, "mode"),
197
+ value="All",
198
+ )
199
+ dtype_filter = gr.Dropdown(
200
+ label="DType",
201
+ choices=get_unique_values(df, "dtype"),
202
+ value="All",
203
+ )
204
+ status_filter = gr.Dropdown(
205
+ label="Status",
206
+ choices=get_unique_values(df, "status"),
207
+ value="All",
208
+ )
209
+
210
+ results_table = gr.DataFrame(
211
+ value=formatted_df,
212
+ label="All Benchmark Results",
213
+ interactive=False,
214
+ wrap=True,
215
+ )
216
+
217
+ gr.Markdown("### 📊 Metrics")
218
+ gr.Markdown(
219
+ "**Benchmark Metrics:**\n"
220
+ "- **load_ms**: Model loading time in milliseconds\n"
221
+ "- **first_infer_ms**: First inference time in milliseconds\n"
222
+ "- **subsequent_infer_ms**: Subsequent inference time in milliseconds\n"
223
+ "- **p50/p90**: 50th and 90th percentile values\n\n"
224
+ "**HuggingFace Metrics:**\n"
225
+ "- **downloads**: Total downloads from HuggingFace Hub\n"
226
+ "- **likes**: Number of likes on HuggingFace Hub\n\n"
227
+ "**WebGPU Compatibility:**\n"
228
+ "- Models in the recommended section are all WebGPU compatible\n"
229
+ "- WebGPU enables GPU acceleration in modern browsers"
230
+ )
231
+
232
+ def update_data():
233
+ """Reload data from HuggingFace."""
234
+ new_df = load_data()
235
+ formatted_new_df = format_dataframe(new_df)
236
+
237
+ # Update recommended models
238
+ new_recommended = get_webgpu_beginner_friendly_models(new_df, limit_per_task=5)
239
+ formatted_new_recommended = format_dataframe(new_recommended)
240
+ new_markdown = format_recommended_models_as_markdown(new_recommended)
241
+
242
+ return (
243
+ new_df, # Update cached raw data
244
+ formatted_new_recommended, # Update recommended models
245
+ new_markdown, # Update markdown output
246
+ formatted_new_df,
247
+ gr.update(choices=get_unique_values(new_df, "task")),
248
+ gr.update(choices=get_unique_values(new_df, "platform")),
249
+ gr.update(choices=get_unique_values(new_df, "device")),
250
+ gr.update(choices=get_unique_values(new_df, "mode")),
251
+ gr.update(choices=get_unique_values(new_df, "dtype")),
252
+ gr.update(choices=get_unique_values(new_df, "status")),
253
+ )
254
+
255
+ def apply_filters(raw_df, model, task, platform, device, mode, dtype, status):
256
+ """Apply filters and return filtered DataFrame."""
257
+ # Use cached raw data instead of reloading
258
+ filtered = filter_data(raw_df, model, task, platform, device, mode, dtype, status)
259
+ return format_dataframe(filtered)
260
+
261
+ # Refresh button updates data and resets filters
262
+ refresh_btn.click(
263
+ fn=update_data,
264
+ outputs=[
265
+ raw_data_state,
266
+ recommended_table,
267
+ markdown_textbox,
268
+ results_table,
269
+ task_filter,
270
+ platform_filter,
271
+ device_filter,
272
+ mode_filter,
273
+ dtype_filter,
274
+ status_filter,
275
+ ],
276
+ )
277
+
278
+ # Filter inputs update the table (using cached raw data)
279
+ filter_inputs = [
280
+ raw_data_state,
281
+ model_filter,
282
+ task_filter,
283
+ platform_filter,
284
+ device_filter,
285
+ mode_filter,
286
+ dtype_filter,
287
+ status_filter,
288
+ ]
289
+
290
+ model_filter.change(
291
+ fn=apply_filters,
292
+ inputs=filter_inputs,
293
+ outputs=results_table,
294
+ )
295
+ task_filter.change(
296
+ fn=apply_filters,
297
+ inputs=filter_inputs,
298
+ outputs=results_table,
299
+ )
300
+ platform_filter.change(
301
+ fn=apply_filters,
302
+ inputs=filter_inputs,
303
+ outputs=results_table,
304
+ )
305
+ device_filter.change(
306
+ fn=apply_filters,
307
+ inputs=filter_inputs,
308
+ outputs=results_table,
309
+ )
310
+ mode_filter.change(
311
+ fn=apply_filters,
312
+ inputs=filter_inputs,
313
+ outputs=results_table,
314
+ )
315
+ dtype_filter.change(
316
+ fn=apply_filters,
317
+ inputs=filter_inputs,
318
+ outputs=results_table,
319
+ )
320
+ status_filter.change(
321
+ fn=apply_filters,
322
+ inputs=filter_inputs,
323
+ outputs=results_table,
324
+ )
325
+
326
+ return demo
327
+
328
+
329
+ demo = create_leaderboard_ui()
330
+ demo.launch(server_name="0.0.0.0", server_port=7861)
src/leaderboard/data_loader.py ADDED
@@ -0,0 +1,820 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Data loader module for loading benchmark results from HuggingFace Dataset.
3
+ """
4
+
5
+ import json
6
+ import logging
7
+ from pathlib import Path
8
+ from typing import List, Dict, Any, Optional
9
+ from datetime import datetime
10
+ import pandas as pd
11
+ from huggingface_hub import snapshot_download, list_models
12
+
13
+ logger = logging.getLogger(__name__)
14
+
15
+
16
+ def load_benchmark_data(
17
+ dataset_repo: str,
18
+ token: Optional[str] = None,
19
+ ) -> pd.DataFrame:
20
+ """Load benchmark data from HuggingFace Dataset repository.
21
+
22
+ Args:
23
+ dataset_repo: HuggingFace dataset repository ID (e.g., "username/dataset-name")
24
+ token: HuggingFace API token (optional, for private datasets)
25
+
26
+ Returns:
27
+ DataFrame containing all benchmark results
28
+ """
29
+ if not dataset_repo:
30
+ return pd.DataFrame()
31
+
32
+ try:
33
+ # Download the entire repository snapshot
34
+ logger.info(f"Downloading dataset snapshot from {dataset_repo}...")
35
+ local_dir = snapshot_download(
36
+ repo_id=dataset_repo,
37
+ repo_type="dataset",
38
+ token=token,
39
+ )
40
+ logger.info(f"Dataset downloaded to {local_dir}")
41
+
42
+ # Find all JSON files in the downloaded directory
43
+ local_path = Path(local_dir)
44
+ json_files = list(local_path.rglob("*.json"))
45
+
46
+ if not json_files:
47
+ logger.warning("No JSON files found in dataset")
48
+ return pd.DataFrame()
49
+
50
+ logger.info(f"Found {len(json_files)} JSON files")
51
+
52
+ # Load all benchmark results
53
+ all_results = []
54
+ for file_path in json_files:
55
+ try:
56
+ with open(file_path, "r") as f:
57
+ result = json.load(f)
58
+
59
+ if result:
60
+ flattened = flatten_result(result)
61
+ all_results.append(flattened)
62
+ except Exception as e:
63
+ logger.error(f"Error loading {file_path}: {e}")
64
+ continue
65
+
66
+ if not all_results:
67
+ return pd.DataFrame()
68
+
69
+ logger.info(f"Loaded {len(all_results)} benchmark results")
70
+
71
+ # Convert to DataFrame
72
+ df = pd.DataFrame(all_results)
73
+
74
+ # Enrich with HuggingFace model metadata
75
+ df = enrich_with_hf_metadata(df)
76
+
77
+ # Add first-timer-friendly score
78
+ df = add_first_timer_score(df)
79
+
80
+ # Sort by model name and timestamp
81
+ if "modelId" in df.columns and "timestamp" in df.columns:
82
+ df = df.sort_values(["modelId", "timestamp"], ascending=[True, False])
83
+
84
+ return df
85
+
86
+ except Exception as e:
87
+ logger.error(f"Error loading benchmark data: {e}")
88
+ return pd.DataFrame()
89
+
90
+
91
+ def flatten_result(result: Dict[str, Any]) -> Dict[str, Any]:
92
+ """Flatten nested benchmark result for display.
93
+
94
+ The HF Dataset format is already flattened by the bench service,
95
+ so we just need to extract the relevant fields.
96
+
97
+ Args:
98
+ result: Raw benchmark result dictionary
99
+
100
+ Returns:
101
+ Flattened dictionary with extracted fields
102
+ """
103
+ # Convert timestamp from milliseconds to datetime
104
+ timestamp_ms = result.get("timestamp", 0)
105
+ timestamp_dt = None
106
+ if timestamp_ms:
107
+ try:
108
+ timestamp_dt = datetime.fromtimestamp(timestamp_ms / 1000)
109
+ except (ValueError, OSError):
110
+ timestamp_dt = None
111
+
112
+ # Determine actual status - if there's an error, it should be "failed"
113
+ status = result.get("status", "")
114
+ if "error" in result:
115
+ status = "failed"
116
+
117
+ flat = {
118
+ "id": result.get("id", ""),
119
+ "platform": result.get("platform", ""),
120
+ "modelId": result.get("modelId", ""),
121
+ "task": result.get("task", ""),
122
+ "mode": result.get("mode", ""),
123
+ "repeats": result.get("repeats", 0),
124
+ "batchSize": result.get("batchSize", 0),
125
+ "device": result.get("device", ""),
126
+ "browser": result.get("browser", ""),
127
+ "dtype": result.get("dtype", ""),
128
+ "headed": result.get("headed", False),
129
+ "status": status,
130
+ "timestamp": timestamp_dt,
131
+ "runtime": result.get("runtime", ""),
132
+ # Initialize metric fields with None (will be filled if metrics exist)
133
+ "load_ms_p50": None,
134
+ "load_ms_p90": None,
135
+ "first_infer_ms_p50": None,
136
+ "first_infer_ms_p90": None,
137
+ "subsequent_infer_ms_p50": None,
138
+ "subsequent_infer_ms_p90": None,
139
+ }
140
+
141
+ # Extract metrics if available (already at top level)
142
+ if "metrics" in result:
143
+ metrics = result["metrics"]
144
+
145
+ # Load time
146
+ if "load_ms" in metrics and "p50" in metrics["load_ms"]:
147
+ flat["load_ms_p50"] = metrics["load_ms"]["p50"]
148
+ flat["load_ms_p90"] = metrics["load_ms"]["p90"]
149
+
150
+ # First inference time
151
+ if "first_infer_ms" in metrics and "p50" in metrics["first_infer_ms"]:
152
+ flat["first_infer_ms_p50"] = metrics["first_infer_ms"]["p50"]
153
+ flat["first_infer_ms_p90"] = metrics["first_infer_ms"]["p90"]
154
+
155
+ # Subsequent inference time
156
+ if "subsequent_infer_ms" in metrics and "p50" in metrics["subsequent_infer_ms"]:
157
+ flat["subsequent_infer_ms_p50"] = metrics["subsequent_infer_ms"]["p50"]
158
+ flat["subsequent_infer_ms_p90"] = metrics["subsequent_infer_ms"]["p90"]
159
+
160
+ # Extract environment info (already at top level)
161
+ if "environment" in result:
162
+ env = result["environment"]
163
+ flat["cpuCores"] = env.get("cpuCores", 0)
164
+ if "memory" in env:
165
+ flat["memory_gb"] = env["memory"].get("deviceMemory", 0)
166
+
167
+ # Calculate duration
168
+ if "completedAt" in result and "startedAt" in result:
169
+ flat["duration_s"] = (result["completedAt"] - result["startedAt"]) / 1000
170
+
171
+ return flat
172
+
173
+
174
+ def enrich_with_hf_metadata(df: pd.DataFrame) -> pd.DataFrame:
175
+ """Enrich benchmark data with HuggingFace model metadata (downloads, likes).
176
+
177
+ Args:
178
+ df: DataFrame containing benchmark results
179
+ token: HuggingFace API token (optional)
180
+
181
+ Returns:
182
+ DataFrame with added downloads and likes columns
183
+ """
184
+ if df.empty or "modelId" not in df.columns:
185
+ return df
186
+
187
+ # Get unique model IDs
188
+ model_ids = df["modelId"].unique().tolist()
189
+
190
+ # Fetch metadata for all models
191
+ model_metadata = {}
192
+ logger.info(f"Fetching metadata for {len(model_ids)} models from HuggingFace...")
193
+
194
+ try:
195
+ for model in list_models(filter=["transformers.js"]):
196
+ if model.id in model_ids:
197
+ model_metadata[model.id] = {
198
+ "downloads": model.downloads or 0,
199
+ "likes": model.likes or 0,
200
+ }
201
+
202
+ # Break early if we have all models
203
+ if len(model_metadata) == len(model_ids):
204
+ break
205
+
206
+ except Exception as e:
207
+ logger.error(f"Error fetching HuggingFace metadata: {e}")
208
+
209
+ # Add metadata to dataframe
210
+ df["downloads"] = df["modelId"].map(lambda x: model_metadata.get(x, {}).get("downloads", 0))
211
+ df["likes"] = df["modelId"].map(lambda x: model_metadata.get(x, {}).get("likes", 0))
212
+
213
+ return df
214
+
215
+
216
+ def add_first_timer_score(df: pd.DataFrame) -> pd.DataFrame:
217
+ """Add first-timer-friendly score to all rows in the dataframe.
218
+
219
+ The score is calculated per task, normalized from 0-100 where:
220
+ - Higher score = better for first-timers
221
+ - Based on: downloads (25%), likes (15%), load time (30%), inference time (30%)
222
+
223
+ Args:
224
+ df: DataFrame containing benchmark results
225
+
226
+ Returns:
227
+ DataFrame with added 'first_timer_score' column
228
+ """
229
+ if df.empty:
230
+ return df
231
+
232
+ # Filter only successful benchmarks
233
+ filtered = df[df["status"] == "completed"].copy() if "status" in df.columns else df.copy()
234
+
235
+ if filtered.empty:
236
+ # Add empty score column for failed benchmarks
237
+ df["first_timer_score"] = None
238
+ return df
239
+
240
+ # Check if task column exists
241
+ if "task" not in filtered.columns:
242
+ df["first_timer_score"] = None
243
+ return df
244
+
245
+ # Calculate score per task
246
+ for task in filtered["task"].unique():
247
+ task_mask = filtered["task"] == task
248
+ task_df = filtered[task_mask].copy()
249
+
250
+ if task_df.empty:
251
+ continue
252
+
253
+ # Normalize metrics within this task (0-1 scale)
254
+
255
+ # Downloads score (0-1, higher is better)
256
+ if "downloads" in task_df.columns:
257
+ max_downloads = task_df["downloads"].max()
258
+ downloads_score = task_df["downloads"] / max_downloads if max_downloads > 0 else 0
259
+ else:
260
+ downloads_score = 0
261
+
262
+ # Likes score (0-1, higher is better)
263
+ if "likes" in task_df.columns:
264
+ max_likes = task_df["likes"].max()
265
+ likes_score = task_df["likes"] / max_likes if max_likes > 0 else 0
266
+ else:
267
+ likes_score = 0
268
+
269
+ # Load time score (0-1, lower time is better)
270
+ if "load_ms_p50" in task_df.columns:
271
+ max_load = task_df["load_ms_p50"].max()
272
+ load_score = 1 - (task_df["load_ms_p50"] / max_load) if max_load > 0 else 0
273
+ else:
274
+ load_score = 0
275
+
276
+ # Inference time score (0-1, lower time is better)
277
+ if "first_infer_ms_p50" in task_df.columns:
278
+ max_infer = task_df["first_infer_ms_p50"].max()
279
+ infer_score = 1 - (task_df["first_infer_ms_p50"] / max_infer) if max_infer > 0 else 0
280
+ else:
281
+ infer_score = 0
282
+
283
+ # Calculate weighted score and scale to 0-100
284
+ weighted_score = (
285
+ (downloads_score * 0.25) +
286
+ (likes_score * 0.15) +
287
+ (load_score * 0.30) +
288
+ (infer_score * 0.30)
289
+ ) * 100
290
+
291
+ # Assign scores back to the filtered dataframe
292
+ filtered.loc[task_mask, "first_timer_score"] = weighted_score
293
+
294
+ # Merge scores back to original dataframe
295
+ if "first_timer_score" in filtered.columns:
296
+ df = df.merge(
297
+ filtered[["id", "first_timer_score"]],
298
+ on="id",
299
+ how="left"
300
+ )
301
+ else:
302
+ df["first_timer_score"] = None
303
+
304
+ return df
305
+
306
+
307
+ def get_first_timer_friendly_models(df: pd.DataFrame, limit_per_task: int = 3) -> pd.DataFrame:
308
+ """Identify first-timer-friendly models based on popularity and performance, grouped by task.
309
+
310
+ A model is considered first-timer-friendly if it:
311
+ - Has high downloads (popular)
312
+ - Has fast load times (easy to start)
313
+ - Has fast inference times (quick results)
314
+ - Successfully completed benchmarks
315
+
316
+ Args:
317
+ df: DataFrame containing benchmark results
318
+ limit_per_task: Maximum number of models to return per task
319
+
320
+ Returns:
321
+ DataFrame with top first-timer-friendly models per task
322
+ """
323
+ if df.empty:
324
+ return pd.DataFrame()
325
+
326
+ # Filter only successful benchmarks
327
+ filtered = df[df["status"] == "completed"].copy() if "status" in df.columns else df.copy()
328
+
329
+ if filtered.empty:
330
+ return pd.DataFrame()
331
+
332
+ # Check if task column exists
333
+ if "task" not in filtered.columns:
334
+ logger.warning("Task column not found in dataframe")
335
+ return pd.DataFrame()
336
+
337
+ # Calculate first-timer-friendliness score per task
338
+ all_results = []
339
+
340
+ for task in filtered["task"].unique():
341
+ task_df = filtered[filtered["task"] == task].copy()
342
+
343
+ if task_df.empty:
344
+ continue
345
+
346
+ # Normalize metrics within this task (lower is better for times, higher is better for popularity)
347
+
348
+ # Downloads score (0-1, higher is better)
349
+ if "downloads" in task_df.columns:
350
+ max_downloads = task_df["downloads"].max()
351
+ task_df["downloads_score"] = task_df["downloads"] / max_downloads if max_downloads > 0 else 0
352
+ else:
353
+ task_df["downloads_score"] = 0
354
+
355
+ # Likes score (0-1, higher is better)
356
+ if "likes" in task_df.columns:
357
+ max_likes = task_df["likes"].max()
358
+ task_df["likes_score"] = task_df["likes"] / max_likes if max_likes > 0 else 0
359
+ else:
360
+ task_df["likes_score"] = 0
361
+
362
+ # Load time score (0-1, lower time is better)
363
+ if "load_ms_p50" in task_df.columns:
364
+ max_load = task_df["load_ms_p50"].max()
365
+ task_df["load_score"] = 1 - (task_df["load_ms_p50"] / max_load) if max_load > 0 else 0
366
+ else:
367
+ task_df["load_score"] = 0
368
+
369
+ # Inference time score (0-1, lower time is better)
370
+ if "first_infer_ms_p50" in task_df.columns:
371
+ max_infer = task_df["first_infer_ms_p50"].max()
372
+ task_df["infer_score"] = 1 - (task_df["first_infer_ms_p50"] / max_infer) if max_infer > 0 else 0
373
+ else:
374
+ task_df["infer_score"] = 0
375
+
376
+ # Calculate weighted first-timer-friendliness score
377
+ # Weights: popularity (40%), load time (30%), inference time (30%)
378
+ task_df["first_timer_score"] = (
379
+ (task_df["downloads_score"] * 0.25) +
380
+ (task_df["likes_score"] * 0.15) +
381
+ (task_df["load_score"] * 0.30) +
382
+ (task_df["infer_score"] * 0.30)
383
+ )
384
+
385
+ # Group by model and take best score for each model within this task
386
+ # Filter out NaN scores before getting idxmax
387
+ idx_max_series = task_df.groupby("modelId")["first_timer_score"].idxmax()
388
+ # Drop NaN indices
389
+ valid_indices = idx_max_series.dropna()
390
+ if valid_indices.empty:
391
+ continue
392
+ best_per_model = task_df.loc[valid_indices]
393
+
394
+ # Sort by first-timer score and take top N for this task
395
+ top_for_task = best_per_model.sort_values("first_timer_score", ascending=False).head(limit_per_task)
396
+
397
+ # Drop intermediate scoring columns
398
+ score_cols = ["downloads_score", "likes_score", "load_score", "infer_score", "first_timer_score"]
399
+ top_for_task = top_for_task.drop(columns=[col for col in score_cols if col in top_for_task.columns])
400
+
401
+ all_results.append(top_for_task)
402
+
403
+ if not all_results:
404
+ return pd.DataFrame()
405
+
406
+ # Combine all results
407
+ result = pd.concat(all_results, ignore_index=True)
408
+
409
+ # Sort by task name for better organization
410
+ if "task" in result.columns:
411
+ result = result.sort_values("task")
412
+
413
+ return result
414
+
415
+
416
+ def get_webgpu_beginner_friendly_models(
417
+ df: pd.DataFrame,
418
+ limit_per_task: int = 5
419
+ ) -> pd.DataFrame:
420
+ """Get top beginner-friendly models that are WebGPU compatible, grouped by task.
421
+
422
+ A model is included if it:
423
+ - Has high first_timer_score (popular, fast to load, fast inference)
424
+ - Has successful WebGPU benchmark results (device=webgpu, status=completed)
425
+
426
+ Args:
427
+ df: DataFrame containing benchmark results
428
+ limit_per_task: Maximum number of models to return per task (default: 5)
429
+
430
+ Returns:
431
+ DataFrame with top WebGPU-compatible beginner-friendly models per task
432
+ """
433
+ if df.empty:
434
+ return pd.DataFrame()
435
+
436
+ # Filter for WebGPU benchmarks that completed successfully
437
+ webgpu_filter = (
438
+ (df["device"] == "webgpu") &
439
+ (df["status"] == "completed")
440
+ )
441
+
442
+ # Check if required columns exist
443
+ if "device" not in df.columns or "status" not in df.columns:
444
+ logger.warning("Required columns (device, status) not found in dataframe")
445
+ return pd.DataFrame()
446
+
447
+ filtered = df[webgpu_filter].copy()
448
+
449
+ if filtered.empty:
450
+ logger.warning("No successful WebGPU benchmarks found")
451
+ return pd.DataFrame()
452
+
453
+ # Check if required columns exist
454
+ if "task" not in filtered.columns or "first_timer_score" not in filtered.columns:
455
+ logger.warning("Required columns (task, first_timer_score) not found in filtered dataframe")
456
+ return pd.DataFrame()
457
+
458
+ # Group by task and get top models
459
+ all_results = []
460
+
461
+ for task in filtered["task"].unique():
462
+ task_df = filtered[filtered["task"] == task].copy()
463
+
464
+ if task_df.empty:
465
+ continue
466
+
467
+ # Remove rows with NaN first_timer_score
468
+ task_df = task_df.dropna(subset=["first_timer_score"])
469
+
470
+ if task_df.empty:
471
+ continue
472
+
473
+ # For each model, get the benchmark with the highest first_timer_score
474
+ idx_max_series = task_df.groupby("modelId")["first_timer_score"].idxmax()
475
+ valid_indices = idx_max_series.dropna()
476
+
477
+ if valid_indices.empty:
478
+ continue
479
+
480
+ best_per_model = task_df.loc[valid_indices]
481
+
482
+ # Sort by first_timer_score (descending) and take top N
483
+ top_for_task = best_per_model.sort_values(
484
+ "first_timer_score",
485
+ ascending=False
486
+ ).head(limit_per_task)
487
+
488
+ all_results.append(top_for_task)
489
+
490
+ if not all_results:
491
+ logger.warning("No models found after filtering and grouping")
492
+ return pd.DataFrame()
493
+
494
+ # Combine all results
495
+ result = pd.concat(all_results, ignore_index=True)
496
+
497
+ # Sort by task, then by first_timer_score (descending)
498
+ if "task" in result.columns and "first_timer_score" in result.columns:
499
+ result = result.sort_values(
500
+ ["task", "first_timer_score"],
501
+ ascending=[True, False]
502
+ )
503
+
504
+ return result
505
+
506
+
507
+ def _get_usage_example(task_type: str, repo_id: str) -> tuple[str, str | None]:
508
+ """Get usage example code snippet for a given task type.
509
+
510
+ Args:
511
+ task_type: The task type (e.g., 'text-generation', 'image-classification')
512
+ repo_id: The model repository ID (e.g., 'Xenova/gpt2')
513
+
514
+ Returns:
515
+ Tuple of (code_snippet, description)
516
+ """
517
+ if task_type == "fill-mask":
518
+ return f"""const unmasker = await pipeline('fill-mask', '{repo_id}');
519
+ const output = await unmasker('The goal of life is [MASK].');
520
+ """, 'Perform masked language modelling (a.k.a. "fill-mask")'
521
+ elif task_type == "question-answering":
522
+ return f"""const answerer = await pipeline('question-answering', '{repo_id}');
523
+ const question = 'Who was Jim Henson?';
524
+ const context = 'Jim Henson was a nice puppet.';
525
+ const output = await answerer(question, context);
526
+ """, 'Run question answering'
527
+ elif task_type == "summarization":
528
+ return f"""const generator = await pipeline('summarization', '{repo_id}');
529
+ const text = 'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, ' +
530
+ 'and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. ' +
531
+ 'During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest ' +
532
+ 'man-made structure in the world, a title it held for 41 years until the Chrysler Building in New ' +
533
+ 'York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to ' +
534
+ 'the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the ' +
535
+ 'Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second ' +
536
+ 'tallest free-standing structure in France after the Millau Viaduct.';
537
+ const output = await generator(text, {{
538
+ max_new_tokens: 100,
539
+ }});
540
+ """, 'Summarization'
541
+ elif task_type == "sentiment-analysis" or task_type == "text-classification":
542
+ return f"""const classifier = await pipeline('{task_type}', '{repo_id}');
543
+ const output = await classifier('I love transformers!');
544
+ """, None
545
+ elif task_type == "text-generation":
546
+ return f"""const generator = await pipeline('text-generation', '{repo_id}');
547
+ const output = await generator('Once upon a time, there was', {{ max_new_tokens: 10 }});
548
+ """, 'Text generation'
549
+ elif task_type == "text2text-generation":
550
+ return f"""const generator = await pipeline('text2text-generation', '{repo_id}');
551
+ const output = await generator('how can I become more healthy?', {{
552
+ max_new_tokens: 100,
553
+ }});
554
+ """, 'Text-to-text generation'
555
+ elif task_type == "token-classification" or task_type == "ner":
556
+ return f"""const classifier = await pipeline('token-classification', '{repo_id}');
557
+ const output = await classifier('My name is Sarah and I live in London');
558
+ """, 'Perform named entity recognition'
559
+ elif task_type == "translation":
560
+ return f"""const translator = await pipeline('translation', '{repo_id}');
561
+ const output = await translator('Life is like a box of chocolate.', {{
562
+ src_lang: '...',
563
+ tgt_lang: '...',
564
+ }});
565
+ """, 'Multilingual translation'
566
+ elif task_type == "zero-shot-classification":
567
+ return f"""const classifier = await pipeline('zero-shot-classification', '{repo_id}');
568
+ const output = await classifier(
569
+ 'I love transformers!',
570
+ ['positive', 'negative']
571
+ );
572
+ """, 'Zero shot classification'
573
+ elif task_type == "feature-extraction":
574
+ return f"""const extractor = await pipeline('feature-extraction', '{repo_id}');
575
+ const output = await extractor('This is a simple test.');
576
+ """, 'Run feature extraction'
577
+ # Vision
578
+ elif task_type == "background-removal":
579
+ return f"""const segmenter = await pipeline('background-removal', '{repo_id}');
580
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/portrait-of-woman_small.jpg';
581
+ const output = await segmenter(url);
582
+ """, 'Perform background removal'
583
+ elif task_type == "depth-estimation":
584
+ return f"""const depth_estimator = await pipeline('depth-estimation', '{repo_id}');
585
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
586
+ const out = await depth_estimator(url);
587
+ """, 'Depth estimation'
588
+ elif task_type == "image-classification":
589
+ return f"""const classifier = await pipeline('image-classification', '{repo_id}');
590
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
591
+ const output = await classifier(url);
592
+ """, 'Classify an image'
593
+ elif task_type == "image-segmentation":
594
+ return f"""const segmenter = await pipeline('image-segmentation', '{repo_id}');
595
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
596
+ const output = await segmenter(url);
597
+ """, 'Perform image segmentation'
598
+ elif task_type == "image-to-image":
599
+ return f"""const processor = await pipeline('image-to-image', '{repo_id}');
600
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
601
+ const output = await processor(url);
602
+ """, None
603
+ elif task_type == "object-detection":
604
+ return f"""const detector = await pipeline('object-detection', '{repo_id}');
605
+ const img = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
606
+ const output = await detector(img, {{ threshold: 0.9 }});
607
+ """, 'Run object-detection'
608
+ elif task_type == "image-feature-extraction":
609
+ return f"""const image_feature_extractor = await pipeline('image-feature-extraction', '{repo_id}');
610
+ const url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png';
611
+ const features = await image_feature_extractor(url);
612
+ """, 'Perform image feature extraction'
613
+ # Audio
614
+ elif task_type == "audio-classification":
615
+ return f"""const classifier = await pipeline('audio-classification', '{repo_id}');
616
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
617
+ const output = await classifier(url);
618
+ """, 'Perform audio classification'
619
+ elif task_type == "automatic-speech-recognition":
620
+ return f"""const transcriber = await pipeline('automatic-speech-recognition', '{repo_id}');
621
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
622
+ const output = await transcriber(url);
623
+ """, 'Transcribe audio from a URL'
624
+ elif task_type == "text-to-audio" or task_type == "text-to-speech":
625
+ return f"""const synthesizer = await pipeline('text-to-speech', '{repo_id}');
626
+ const output = await synthesizer('Hello, my dog is cute');
627
+ """, 'Generate audio from text'
628
+ # Multimodal
629
+ elif task_type == "document-question-answering":
630
+ return f"""const qa_pipeline = await pipeline('document-question-answering', '{repo_id}');
631
+ const image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice.png';
632
+ const question = 'What is the invoice number?';
633
+ const output = await qa_pipeline(image, question);
634
+ """, 'Answer questions about a document'
635
+ elif task_type == "image-to-text":
636
+ return f"""const captioner = await pipeline('image-to-text', '{repo_id}');
637
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
638
+ const output = await captioner(url);
639
+ """, 'Generate a caption for an image'
640
+ elif task_type == "zero-shot-audio-classification":
641
+ return f"""const classifier = await pipeline('zero-shot-audio-classification', '{repo_id}');
642
+ const audio = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/dog_barking.wav';
643
+ const candidate_labels = ['dog', 'vaccum cleaner'];
644
+ const scores = await classifier(audio, candidate_labels);
645
+ """, 'Perform zero-shot audio classification'
646
+ elif task_type == "zero-shot-image-classification":
647
+ return f"""const classifier = await pipeline('zero-shot-image-classification', '{repo_id}');
648
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
649
+ const output = await classifier(url, ['tiger', 'horse', 'dog']);
650
+ """, 'Zero shot image classification'
651
+ elif task_type == "zero-shot-object-detection":
652
+ return f"""const detector = await pipeline('zero-shot-object-detection', '{repo_id}');
653
+ const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
654
+ const candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
655
+ const output = await detector(url, candidate_labels);
656
+ """, 'Zero-shot object detection'
657
+ else:
658
+ logger.warning(f"No usage example found for task type: {task_type}")
659
+ return f"""const pipe = await pipeline('{task_type}', '{repo_id}');
660
+ const result = await pipe('input text or data');
661
+ console.log(result);
662
+ """, None
663
+
664
+
665
+ def format_recommended_models_as_markdown(df: pd.DataFrame) -> str:
666
+ """Format recommended WebGPU models as markdown for llms.txt embedding.
667
+
668
+ Args:
669
+ df: DataFrame containing recommended models (output from get_webgpu_beginner_friendly_models)
670
+
671
+ Returns:
672
+ Formatted markdown string
673
+ """
674
+ if df.empty:
675
+ return "No recommended models available."
676
+
677
+ markdown_lines = [
678
+ "# Recommended Transformers.js Models for First-Time Trials",
679
+ "",
680
+ "This guide provides curated model recommendations for each task type, selected for their:",
681
+ "- **Popularity**: Widely used with strong community support",
682
+ "- **Performance**: Fast loading and inference times",
683
+ "- **WebGPU Compatibility**: GPU-accelerated in modern browsers",
684
+ "",
685
+ "**Important:** These recommendations are designed for initial experimentation and learning. "
686
+ "Many other models are available for each task. "
687
+ "**You should evaluate and choose the best model for your specific use case, performance requirements, and constraints.**",
688
+ "",
689
+ "---",
690
+ "",
691
+ "## About the Model Recommendations",
692
+ "",
693
+ "The models below are selected for their popularity and ease of use, making them ideal for initial experimentation. "
694
+ "**This list does not cover all available models** - you should evaluate and select the best model for your specific use case and requirements.",
695
+ "",
696
+ ]
697
+
698
+ # Group by task
699
+ if "task" not in df.columns:
700
+ return "No task information available."
701
+
702
+ for task in sorted(df["task"].unique()):
703
+ task_df = df[df["task"] == task].copy()
704
+
705
+ if task_df.empty:
706
+ continue
707
+
708
+ # Add task header
709
+ markdown_lines.append(f"## {task.title()}")
710
+ markdown_lines.append("")
711
+
712
+ # Sort by first_timer_score descending
713
+ if "first_timer_score" in task_df.columns:
714
+ task_df = task_df.sort_values("first_timer_score", ascending=False)
715
+
716
+ # Get the first/best model for the usage example
717
+ first_row = task_df.iloc[0]
718
+ first_model_id = first_row.get("modelId", "")
719
+
720
+ # Add usage example using the top model
721
+ if first_model_id:
722
+ code_snippet, description = _get_usage_example(task, first_model_id)
723
+
724
+ if description:
725
+ markdown_lines.append(f"**Usage Example:** {description}")
726
+ else:
727
+ markdown_lines.append("**Usage Example:**")
728
+ markdown_lines.append("")
729
+ markdown_lines.append("```javascript")
730
+ markdown_lines.append(code_snippet.strip())
731
+ markdown_lines.append("```")
732
+ markdown_lines.append("")
733
+
734
+ # Add section header for model recommendations
735
+ markdown_lines.append("### Recommended Models for First-Time Trials")
736
+ markdown_lines.append("")
737
+
738
+ # Add each model
739
+ for idx, row in task_df.iterrows():
740
+ model_id = row.get("modelId", "Unknown")
741
+ score = row.get("first_timer_score", None)
742
+ downloads = row.get("downloads", 0)
743
+ likes = row.get("likes", 0)
744
+ load_time = row.get("load_ms_p50", None)
745
+ infer_time = row.get("first_infer_ms_p50", None)
746
+
747
+ # Model entry
748
+ markdown_lines.append(f"### {model_id}")
749
+ markdown_lines.append("")
750
+
751
+ # WebGPU compatibility
752
+ markdown_lines.append("**WebGPU Compatible:** ✅ Yes")
753
+ markdown_lines.append("")
754
+
755
+ # Metrics
756
+ metrics = []
757
+ if load_time is not None:
758
+ metrics.append(f"Load: {load_time:.1f}ms")
759
+ if infer_time is not None:
760
+ metrics.append(f"Inference: {infer_time:.1f}ms")
761
+ if downloads:
762
+ if downloads >= 1_000_000:
763
+ downloads_str = f"{downloads / 1_000_000:.1f}M"
764
+ elif downloads >= 1_000:
765
+ downloads_str = f"{downloads / 1_000:.1f}k"
766
+ else:
767
+ downloads_str = str(downloads)
768
+ metrics.append(f"Downloads: {downloads_str}")
769
+ if likes:
770
+ metrics.append(f"Likes: {likes}")
771
+
772
+ if metrics:
773
+ markdown_lines.append(f"**Metrics:** {' | '.join(metrics)}")
774
+
775
+ markdown_lines.append("")
776
+
777
+ markdown_lines.append("---")
778
+ markdown_lines.append("")
779
+
780
+ # Add footer
781
+ markdown_lines.extend([
782
+ "## About These Recommendations",
783
+ "",
784
+ "### Selection Criteria",
785
+ "",
786
+ "Models in this guide are selected based on:",
787
+ "- **Popularity**: High download counts and community engagement on HuggingFace Hub",
788
+ "- **Performance**: Fast loading and inference times based on benchmark results",
789
+ "- **Compatibility**: Verified WebGPU support for GPU-accelerated browser execution",
790
+ "",
791
+ "### For Production Use",
792
+ "",
793
+ "These recommendations are optimized for first-time trials and learning. "
794
+ "For production applications, consider:",
795
+ "- Evaluating multiple models for your specific use case",
796
+ "- Testing with your actual data and performance requirements",
797
+ "- Reviewing the full benchmark results for comprehensive comparisons",
798
+ "- Exploring specialized models that may better fit your needs",
799
+ "",
800
+ "Visit the full leaderboard to explore all available models and their benchmark results.",
801
+ ])
802
+
803
+ return "\n".join(markdown_lines)
804
+
805
+
806
+ def get_unique_values(df: pd.DataFrame, column: str) -> List[str]:
807
+ """Get unique values from a column for dropdown choices.
808
+
809
+ Args:
810
+ df: DataFrame to extract values from
811
+ column: Column name
812
+
813
+ Returns:
814
+ List of unique values with "All" as first item
815
+ """
816
+ if df.empty or column not in df.columns:
817
+ return ["All"]
818
+
819
+ values = df[column].dropna().unique().tolist()
820
+ return ["All"] + sorted([str(v) for v in values])
src/leaderboard/formatters.py ADDED
@@ -0,0 +1,346 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Formatting utilities for displaying benchmark data with emojis.
3
+ """
4
+
5
+ from typing import Any, Optional
6
+ from datetime import datetime
7
+
8
+
9
+ def format_platform(platform: str) -> str:
10
+ """Format platform with emoji."""
11
+ emoji_map = {
12
+ "node": "🟢",
13
+ "web": "🌐",
14
+ }
15
+ emoji = emoji_map.get(platform, "")
16
+ return f"{emoji} {platform}" if emoji else platform
17
+
18
+
19
+ def format_device(device: str) -> str:
20
+ """Format device with emoji."""
21
+ emoji_map = {
22
+ "wasm": "📦",
23
+ "webgpu": "⚡",
24
+ "cpu": "🖥️",
25
+ "cuda": "🎮",
26
+ }
27
+ emoji = emoji_map.get(device, "")
28
+ return f"{emoji} {device}" if emoji else device
29
+
30
+
31
+ def format_browser(browser: str) -> str:
32
+ """Format browser with emoji."""
33
+ if not browser:
34
+ return ""
35
+
36
+ emoji_map = {
37
+ "chromium": "🔵",
38
+ "chrome": "🔵",
39
+ "firefox": "🦊",
40
+ "webkit": "🧭",
41
+ "safari": "🧭",
42
+ }
43
+ emoji = emoji_map.get(browser.lower(), "")
44
+ return f"{emoji} {browser}" if emoji else browser
45
+
46
+
47
+ def format_status(status: str) -> str:
48
+ """Format status with emoji."""
49
+ emoji_map = {
50
+ "completed": "✅",
51
+ "failed": "❌",
52
+ "running": "🔄",
53
+ "pending": "⏳",
54
+ }
55
+ emoji = emoji_map.get(status, "")
56
+ return f"{emoji} {status}" if emoji else status
57
+
58
+
59
+ def format_mode(mode: str) -> str:
60
+ """Format mode with emoji."""
61
+ emoji_map = {
62
+ "warm": "🔥",
63
+ "cold": "❄️",
64
+ }
65
+ emoji = emoji_map.get(mode, "")
66
+ return f"{emoji} {mode}" if emoji else mode
67
+
68
+
69
+ def format_headed(headed: bool) -> str:
70
+ """Format headed mode with emoji."""
71
+ return "👁️ Yes" if headed else "No"
72
+
73
+
74
+ def format_metric_ms(value: Optional[float], metric_type: str = "inference") -> str:
75
+ """Format metric in milliseconds with performance emoji.
76
+
77
+ Args:
78
+ value: Metric value in milliseconds
79
+ metric_type: Type of metric ('load', 'inference')
80
+
81
+ Returns:
82
+ Formatted string with emoji
83
+ """
84
+ if value is None or value == 0:
85
+ return "-"
86
+
87
+ # Different thresholds for different metric types
88
+ if metric_type == "load":
89
+ # Load time thresholds (in ms)
90
+ if value < 100:
91
+ emoji = "🚀" # Very fast
92
+ elif value < 500:
93
+ emoji = "⚡" # Fast
94
+ elif value < 2000:
95
+ emoji = "✅" # Good
96
+ elif value < 5000:
97
+ emoji = "⚠️" # Slow
98
+ else:
99
+ emoji = "🐌" # Very slow
100
+ else: # inference
101
+ # Inference time thresholds (in ms)
102
+ if value < 5:
103
+ emoji = "🚀" # Very fast
104
+ elif value < 20:
105
+ emoji = "⚡" # Fast
106
+ elif value < 50:
107
+ emoji = "✅" # Good
108
+ elif value < 100:
109
+ emoji = "⚠️" # Slow
110
+ else:
111
+ emoji = "🐌" # Very slow
112
+
113
+ return f"{emoji} {value:.1f}ms"
114
+
115
+
116
+ def format_duration(duration_s: Optional[float]) -> str:
117
+ """Format duration with emoji."""
118
+ if duration_s is None or duration_s == 0:
119
+ return "-"
120
+
121
+ if duration_s < 5:
122
+ emoji = "🚀" # Very fast
123
+ elif duration_s < 15:
124
+ emoji = "⚡" # Fast
125
+ elif duration_s < 60:
126
+ emoji = "✅" # Good
127
+ elif duration_s < 300:
128
+ emoji = "⚠️" # Slow
129
+ else:
130
+ emoji = "🐌" # Very slow
131
+
132
+ return f"{emoji} {duration_s:.1f}s"
133
+
134
+
135
+ def format_memory(memory_gb: Optional[int]) -> str:
136
+ """Format memory with emoji."""
137
+ if memory_gb is None or memory_gb == 0:
138
+ return "-"
139
+
140
+ if memory_gb >= 32:
141
+ emoji = "💪" # High
142
+ elif memory_gb >= 16:
143
+ emoji = "✅" # Good
144
+ elif memory_gb >= 8:
145
+ emoji = "⚠️" # Medium
146
+ else:
147
+ emoji = "📉" # Low
148
+
149
+ return f"{emoji} {memory_gb}GB"
150
+
151
+
152
+ def format_cpu_cores(cores: Optional[int]) -> str:
153
+ """Format CPU cores with emoji."""
154
+ if cores is None or cores == 0:
155
+ return "-"
156
+
157
+ if cores >= 16:
158
+ emoji = "💪" # Many
159
+ elif cores >= 8:
160
+ emoji = "✅" # Good
161
+ elif cores >= 4:
162
+ emoji = "⚠️" # Medium
163
+ else:
164
+ emoji = "📉" # Few
165
+
166
+ return f"{emoji} {cores} cores"
167
+
168
+
169
+ def format_timestamp(timestamp: Optional[datetime]) -> str:
170
+ """Format timestamp as datetime string.
171
+
172
+ Args:
173
+ timestamp: datetime object
174
+
175
+ Returns:
176
+ Formatted datetime string
177
+ """
178
+ if timestamp is None:
179
+ return "-"
180
+
181
+ try:
182
+ # Format as readable datetime
183
+ return timestamp.strftime("%Y-%m-%d %H:%M:%S")
184
+ except (ValueError, AttributeError):
185
+ return str(timestamp)
186
+
187
+
188
+ def format_downloads(downloads: Optional[int]) -> str:
189
+ """Format downloads count with emoji.
190
+
191
+ Args:
192
+ downloads: Number of downloads
193
+
194
+ Returns:
195
+ Formatted string with emoji
196
+ """
197
+ if downloads is None or downloads == 0:
198
+ return "-"
199
+
200
+ # Format large numbers
201
+ if downloads >= 1_000_000:
202
+ formatted = f"{downloads / 1_000_000:.1f}M"
203
+ emoji = "🔥" # Very popular
204
+ elif downloads >= 100_000:
205
+ formatted = f"{downloads / 1_000:.0f}k"
206
+ emoji = "⭐" # Popular
207
+ elif downloads >= 10_000:
208
+ formatted = f"{downloads / 1_000:.1f}k"
209
+ emoji = "✨" # Well-known
210
+ elif downloads >= 1_000:
211
+ formatted = f"{downloads / 1_000:.1f}k"
212
+ emoji = "📊" # Moderate
213
+ else:
214
+ formatted = str(downloads)
215
+ emoji = "📈" # New/niche
216
+
217
+ return f"{emoji} {formatted}"
218
+
219
+
220
+ def format_likes(likes: Optional[int]) -> str:
221
+ """Format likes count with emoji.
222
+
223
+ Args:
224
+ likes: Number of likes
225
+
226
+ Returns:
227
+ Formatted string with emoji
228
+ """
229
+ if likes is None or likes == 0:
230
+ return "-"
231
+
232
+ # Format based on popularity
233
+ if likes >= 1000:
234
+ emoji = "💖" # Very popular
235
+ elif likes >= 100:
236
+ emoji = "❤️" # Popular
237
+ elif likes >= 50:
238
+ emoji = "💙" # Well-liked
239
+ elif likes >= 10:
240
+ emoji = "💚" # Moderate
241
+ else:
242
+ emoji = "🤍" # Few likes
243
+
244
+ return f"{emoji} {likes}"
245
+
246
+
247
+ def format_first_timer_score(score: Optional[float]) -> str:
248
+ """Format first-timer-friendly score with emoji.
249
+
250
+ Args:
251
+ score: First-timer score (0-100)
252
+
253
+ Returns:
254
+ Formatted string with emoji
255
+ """
256
+ if score is None:
257
+ return "-"
258
+
259
+ # Format based on score (0-100 scale)
260
+ if score >= 80:
261
+ emoji = "⭐⭐⭐" # Excellent
262
+ elif score >= 60:
263
+ emoji = "⭐⭐" # Good
264
+ elif score >= 40:
265
+ emoji = "⭐" # Fair
266
+ else:
267
+ emoji = "·" # Below average
268
+
269
+ return f"{emoji} {score:.0f}"
270
+
271
+
272
+ def apply_formatting(df_dict: dict) -> dict:
273
+ """Apply emoji formatting to a benchmark result dictionary.
274
+
275
+ Args:
276
+ df_dict: Dictionary containing benchmark data (one row)
277
+
278
+ Returns:
279
+ Dictionary with formatted values
280
+ """
281
+ formatted = df_dict.copy()
282
+
283
+ # Format categorical fields
284
+ if "platform" in formatted:
285
+ formatted["platform"] = format_platform(formatted["platform"])
286
+
287
+ if "device" in formatted:
288
+ formatted["device"] = format_device(formatted["device"])
289
+
290
+ if "browser" in formatted:
291
+ formatted["browser"] = format_browser(formatted["browser"])
292
+
293
+ if "status" in formatted:
294
+ formatted["status"] = format_status(formatted["status"])
295
+
296
+ if "mode" in formatted:
297
+ formatted["mode"] = format_mode(formatted["mode"])
298
+
299
+ if "headed" in formatted:
300
+ formatted["headed"] = format_headed(formatted["headed"])
301
+
302
+ # Format metrics
303
+ if "load_ms_p50" in formatted:
304
+ formatted["load_ms_p50"] = format_metric_ms(formatted["load_ms_p50"], "load")
305
+
306
+ if "load_ms_p90" in formatted:
307
+ formatted["load_ms_p90"] = format_metric_ms(formatted["load_ms_p90"], "load")
308
+
309
+ if "first_infer_ms_p50" in formatted:
310
+ formatted["first_infer_ms_p50"] = format_metric_ms(formatted["first_infer_ms_p50"], "inference")
311
+
312
+ if "first_infer_ms_p90" in formatted:
313
+ formatted["first_infer_ms_p90"] = format_metric_ms(formatted["first_infer_ms_p90"], "inference")
314
+
315
+ if "subsequent_infer_ms_p50" in formatted:
316
+ formatted["subsequent_infer_ms_p50"] = format_metric_ms(formatted["subsequent_infer_ms_p50"], "inference")
317
+
318
+ if "subsequent_infer_ms_p90" in formatted:
319
+ formatted["subsequent_infer_ms_p90"] = format_metric_ms(formatted["subsequent_infer_ms_p90"], "inference")
320
+
321
+ # Format environment info
322
+ if "memory_gb" in formatted:
323
+ formatted["memory_gb"] = format_memory(formatted["memory_gb"])
324
+
325
+ if "cpuCores" in formatted:
326
+ formatted["cpuCores"] = format_cpu_cores(formatted["cpuCores"])
327
+
328
+ if "duration_s" in formatted:
329
+ formatted["duration_s"] = format_duration(formatted["duration_s"])
330
+
331
+ # Format timestamp
332
+ if "timestamp" in formatted:
333
+ formatted["timestamp"] = format_timestamp(formatted["timestamp"])
334
+
335
+ # Format HuggingFace metadata
336
+ if "downloads" in formatted:
337
+ formatted["downloads"] = format_downloads(formatted["downloads"])
338
+
339
+ if "likes" in formatted:
340
+ formatted["likes"] = format_likes(formatted["likes"])
341
+
342
+ # Format first-timer score
343
+ if "first_timer_score" in formatted:
344
+ formatted["first_timer_score"] = format_first_timer_score(formatted["first_timer_score"])
345
+
346
+ return formatted
uv.lock ADDED
The diff for this file is too large to render. See raw diff