Spaces:

whitphx
/

transformersjs-bench-leaderboard

Sleeping

App Files Files Community

whitphx HF Staff commited on 15 days ago

Commit

b59a5d0

0 Parent(s):

Move leaderboard app from bench repo

Browse files

Files changed (11) hide show

.env.example +6 -0
.gitignore +41 -0
.python-version +1 -0
README.md +203 -0
main.py +6 -0
pyproject.toml +24 -0
src/leaderboard/__init__.py +15 -0
src/leaderboard/app.py +330 -0
src/leaderboard/data_loader.py +820 -0
src/leaderboard/formatters.py +346 -0
uv.lock +0 -0

.env.example ADDED Viewed

	@@ -0,0 +1,6 @@

+# HuggingFace Dataset Repository
+# The dataset repository where benchmark results are stored
+HF_DATASET_REPO=your-username/your-dataset-repo
+# HuggingFace API Token (optional, for private datasets)
+HF_TOKEN=your_token_here

.gitignore ADDED Viewed

	@@ -0,0 +1,41 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+.venv/
+venv/
+ENV/
+env/
+# Environment variables
+.env
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.13

README.md ADDED Viewed

	@@ -0,0 +1,203 @@

+---
+title: Transformers.js Benchmark Leaderboard
+emoji: 🏆
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.49.1
+app_file: src/leaderboard/app.py
+pinned: false
+---
+# Transformers.js Benchmark Leaderboard
+A Gradio-based leaderboard that displays benchmark results from a HuggingFace Dataset repository.
+## Features
+- 📊 Display benchmark results in a searchable/filterable table
+- 🔍 Filter by model name, task, platform, device, mode, and dtype
+- 🔄 Refresh data on demand from HuggingFace Dataset
+- 📈 View performance metrics (load time, inference time, p50/p90 percentiles)
+## Setup
+1. Install dependencies:
+   ```bash
+   uv sync
+   ```
+2. Configure environment variables:
+   ```bash
+   cp .env.example .env
+   ```
+   Edit `.env` and set:
+   - `HF_DATASET_REPO`: Your HuggingFace dataset repository (e.g., `username/transformersjs-benchmarks`)
+   - `HF_TOKEN`: Your HuggingFace API token (optional, for private datasets)
+## Usage
+Run the leaderboard:
+```bash
+uv run python -m leaderboard.app
+```
+Or using the installed script:
+```bash
+uv run leaderboard
+```
+The leaderboard will be available at: http://localhost:7861
+## Data Format
+The leaderboard reads JSONL files from the HuggingFace Dataset repository. Each line should be a JSON object with the following structure:
+```json
+{
+  "id": "benchmark-id",
+  "platform": "web",
+  "modelId": "Xenova/all-MiniLM-L6-v2",
+  "task": "feature-extraction",
+  "mode": "warm",
+  "repeats": 3,
+  "batchSize": 1,
+  "device": "wasm",
+  "browser": "chromium",
+  "dtype": "fp32",
+  "headed": false,
+  "status": "completed",
+  "timestamp": 1234567890,
+  "result": {
+    "metrics": {
+      "load_ms": {"p50": 100, "p90": 120},
+      "first_infer_ms": {"p50": 10, "p90": 15},
+      "subsequent_infer_ms": {"p50": 8, "p90": 12}
+    },
+    "environment": {
+      "cpuCores": 10,
+      "memory": {"deviceMemory": 8}
+    }
+  }
+}
+```
+## Deployment on Hugging Face Spaces
+This leaderboard is designed to be deployed on [Hugging Face Spaces](https://huggingface.co/spaces) using the Gradio SDK.
+### Quick Deploy
+1. **Create a new Space** on Hugging Face:
+   - Go to https://huggingface.co/new-space
+   - Choose **Gradio** as the SDK
+   - Set the Space name (e.g., `transformersjs-benchmark-leaderboard`)
+2. **Upload files to your Space**:
+   ```bash
+   # Clone your Space repository
+   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+   cd YOUR_SPACE_NAME
+   # Copy leaderboard files
+   cp -r /path/to/leaderboard/* .
+   # Commit and push
+   git add .
+   git commit -m "Initial leaderboard deployment"
+   git push
+   ```
+3. **Configure Space secrets**:
+   - Go to your Space settings → **Variables and secrets**
+   - Add the following secrets:
+     - `HF_DATASET_REPO`: Your dataset repository (e.g., `username/benchmark-results`)
+     - `HF_TOKEN`: Your HuggingFace API token (for private datasets)
+4. **Space will automatically deploy** and be available at:
+   ```
+   https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
+   ```
+### Space Configuration
+The Space is configured via the YAML frontmatter in `README.md`:
+```yaml
+---
+title: Transformers.js Benchmark Leaderboard
+emoji: 🏆
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.49.1
+app_file: src/leaderboard/app.py
+pinned: false
+---
+```
+**Key configuration options:**
+- `sdk`: Must be `gradio` for Gradio apps
+- `sdk_version`: Gradio version (matches your `pyproject.toml`)
+- `app_file`: Path to the main Python file (relative to repository root)
+- `pinned`: Set to `true` to pin the Space on your profile
+### Requirements
+The Space will automatically install dependencies from `pyproject.toml`:
+- `gradio>=5.9.1`
+- `pandas`
+- `huggingface-hub`
+- `python-dotenv`
+### Environment Variables
+Set these in your Space settings or in a `.env` file (not recommended for production):
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `HF_DATASET_REPO` | Yes | HuggingFace dataset repository containing benchmark results |
+| `HF_TOKEN` | No | HuggingFace API token (only for private datasets) |
+### Auto-Restart
+Spaces automatically restart when:
+- Code is pushed to the repository
+- Dependencies are updated
+- Environment variables are changed
+### Monitoring
+- View logs in the Space's **Logs** tab
+- Check status in the **Settings** tab
+- Monitor resource usage (CPU, memory)
+## Development
+The leaderboard is built with:
+- **Gradio**: Web UI framework
+- **Pandas**: Data manipulation
+- **HuggingFace Hub**: Dataset loading
+### Local Development
+1. Install dependencies:
+   ```bash
+   uv sync
+   ```
+2. Set environment variables:
+   ```bash
+   export HF_DATASET_REPO="your-username/benchmark-results"
+   export HF_TOKEN="your-hf-token"  # Optional
+   ```
+3. Run locally:
+   ```bash
+   uv run python -m leaderboard.app
+   ```
+4. Access at: http://localhost:7861

main.py ADDED Viewed

	@@ -0,0 +1,6 @@

+def main():
+    print("Hello from leaderboard!")
+if __name__ == "__main__":
+    main()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,24 @@

+[project]
+name = "leaderboard"
+version = "0.1.0"
+description = "Transformers.js Benchmark Leaderboard - Display benchmark results from HuggingFace Dataset"
+requires-python = ">=3.13"
+dependencies = [
+    "gradio>=5.49.1",
+    "huggingface-hub>=0.35.3",
+    "pandas>=2.3.3",
+    "python-dotenv>=1.1.1",
+]
+[project.scripts]
+leaderboard = "leaderboard.app:create_leaderboard_ui"
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build.targets.wheel]
+packages = ["src/leaderboard"]
+[tool.uv]
+package = true

src/leaderboard/__init__.py ADDED Viewed

	@@ -0,0 +1,15 @@

+"""Transformers.js Benchmark Leaderboard"""
+from .app import create_leaderboard_ui
+from .data_loader import load_benchmark_data, get_unique_values, flatten_result, get_first_timer_friendly_models
+from .formatters import apply_formatting
+__version__ = "0.1.0"
+__all__ = [
+    "create_leaderboard_ui",
+    "load_benchmark_data",
+    "get_unique_values",
+    "flatten_result",
+    "get_first_timer_friendly_models",
+    "apply_formatting",
+]

src/leaderboard/app.py ADDED Viewed

	@@ -0,0 +1,330 @@

+"""
+Transformers.js Benchmark Leaderboard
+A Gradio app that displays benchmark results from a HuggingFace Dataset repository.
+"""
+import os
+import logging
+import pandas as pd
+import gradio as gr
+from dotenv import load_dotenv
+from leaderboard.data_loader import (
+    load_benchmark_data,
+    get_unique_values,
+    get_webgpu_beginner_friendly_models,
+    format_recommended_models_as_markdown,
+)
+from leaderboard.formatters import apply_formatting
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    datefmt='%Y-%m-%d %H:%M:%S'
+)
+# Load environment variables
+load_dotenv()
+HF_DATASET_REPO = os.getenv("HF_DATASET_REPO")
+HF_TOKEN = os.getenv("HF_TOKEN")
+def load_data() -> pd.DataFrame:
+    """Load benchmark data from configured HF Dataset repository."""
+    # Load raw data
+    df = load_benchmark_data(
+        dataset_repo=HF_DATASET_REPO,
+        token=HF_TOKEN,
+    )
+    return df
+def format_dataframe(df: pd.DataFrame) -> pd.DataFrame:
+    """Apply formatting to dataframe for display."""
+    if df.empty:
+        return df
+    return df.apply(lambda row: pd.Series(apply_formatting(row.to_dict())), axis=1)
+def filter_data(
+    df: pd.DataFrame,
+    model_filter: str,
+    task_filter: str,
+    platform_filter: str,
+    device_filter: str,
+    mode_filter: str,
+    dtype_filter: str,
+    status_filter: str,
+) -> pd.DataFrame:
+    """Filter benchmark data based on user inputs."""
+    if df.empty:
+        return df
+    filtered = df.copy()
+    # Model name filter
+    if model_filter:
+        filtered = filtered[
+            filtered["modelId"].str.contains(model_filter, case=False, na=False)
+        ]
+    # Task filter
+    if task_filter and task_filter != "All":
+        filtered = filtered[filtered["task"] == task_filter]
+    # Platform filter
+    if platform_filter and platform_filter != "All":
+        filtered = filtered[filtered["platform"] == platform_filter]
+    # Device filter
+    if device_filter and device_filter != "All":
+        filtered = filtered[filtered["device"] == device_filter]
+    # Mode filter
+    if mode_filter and mode_filter != "All":
+        filtered = filtered[filtered["mode"] == mode_filter]
+    # DType filter
+    if dtype_filter and dtype_filter != "All":
+        filtered = filtered[filtered["dtype"] == dtype_filter]
+    # Status filter
+    if status_filter and status_filter != "All":
+        filtered = filtered[filtered["status"] == status_filter]
+    return filtered
+def create_leaderboard_ui():
+    """Create the Gradio UI for the leaderboard."""
+    # Load initial data
+    df = load_data()
+    formatted_df = format_dataframe(df)
+    with gr.Blocks(title="Transformers.js Benchmark Leaderboard") as demo:
+        # Cache raw data in Gradio state to avoid reloading on every filter change
+        raw_data_state = gr.State(df)
+        gr.Markdown("# 🏆 Transformers.js Benchmark Leaderboard")
+        gr.Markdown(
+            "Compare benchmark results for different models, platforms, and configurations."
+        )
+        if not HF_DATASET_REPO:
+            gr.Markdown(
+                "⚠️ **HF_DATASET_REPO not configured.** "
+                "Please set the environment variable to load benchmark data."
+            )
+        gr.Markdown(
+            "💡 **Tip:** Use the recommended models section below to find popular models "
+            "that are fast to load and quick to run - perfect for getting started!"
+        )
+        # Recommended models section
+        gr.Markdown("## ⭐ Recommended WebGPU Models for Beginners")
+        gr.Markdown(
+            "These models are selected for being:\n"
+            "- **WebGPU compatible** - Work in modern browsers with GPU acceleration\n"
+            "- **Beginner-friendly** - Popular, fast to load, and quick to run\n"
+            "- Sorted by task type, showing top 3-5 models per task"
+        )
+        # Get recommended models
+        recommended_models = get_webgpu_beginner_friendly_models(df, limit_per_task=5)
+        formatted_recommended = format_dataframe(recommended_models)
+        markdown_output = format_recommended_models_as_markdown(recommended_models)
+        recommended_table = gr.DataFrame(
+            value=formatted_recommended,
+            label="Top WebGPU-Compatible Models by Task",
+            interactive=False,
+            wrap=True,
+        )
+        gr.Markdown("### 📝 Markdown Output for llms.txt")
+        gr.Markdown(
+            "Copy the markdown below to embed in your llms.txt or documentation:"
+        )
+        markdown_textbox = gr.Textbox(
+            value=markdown_output,
+            label="Markdown for llms.txt",
+            lines=20,
+            max_lines=30,
+            show_copy_button=True,
+            interactive=False,
+        )
+        gr.Markdown("---")
+        gr.Markdown("## 🔍 Full Benchmark Results")
+        with gr.Row():
+            refresh_btn = gr.Button("🔄 Refresh Data", variant="primary")
+        with gr.Row():
+            model_filter = gr.Textbox(
+                label="Model Name",
+                placeholder="Filter by model name (e.g., 'bert', 'gpt')",
+            )
+            task_filter = gr.Dropdown(
+                label="Task",
+                choices=get_unique_values(df, "task"),
+                value="All",
+            )
+        with gr.Row():
+            platform_filter = gr.Dropdown(
+                label="Platform",
+                choices=get_unique_values(df, "platform"),
+                value="All",
+            )
+            device_filter = gr.Dropdown(
+                label="Device",
+                choices=get_unique_values(df, "device"),
+                value="All",
+            )
+        with gr.Row():
+            mode_filter = gr.Dropdown(
+                label="Mode",
+                choices=get_unique_values(df, "mode"),
+                value="All",
+            )
+            dtype_filter = gr.Dropdown(
+                label="DType",
+                choices=get_unique_values(df, "dtype"),
+                value="All",
+            )
+            status_filter = gr.Dropdown(
+                label="Status",
+                choices=get_unique_values(df, "status"),
+                value="All",
+            )
+        results_table = gr.DataFrame(
+            value=formatted_df,
+            label="All Benchmark Results",
+            interactive=False,
+            wrap=True,
+        )
+        gr.Markdown("### 📊 Metrics")
+        gr.Markdown(
+            "**Benchmark Metrics:**\n"
+            "- **load_ms**: Model loading time in milliseconds\n"
+            "- **first_infer_ms**: First inference time in milliseconds\n"
+            "- **subsequent_infer_ms**: Subsequent inference time in milliseconds\n"
+            "- **p50/p90**: 50th and 90th percentile values\n\n"
+            "**HuggingFace Metrics:**\n"
+            "- **downloads**: Total downloads from HuggingFace Hub\n"
+            "- **likes**: Number of likes on HuggingFace Hub\n\n"
+            "**WebGPU Compatibility:**\n"
+            "- Models in the recommended section are all WebGPU compatible\n"
+            "- WebGPU enables GPU acceleration in modern browsers"
+        )
+        def update_data():
+            """Reload data from HuggingFace."""
+            new_df = load_data()
+            formatted_new_df = format_dataframe(new_df)
+            # Update recommended models
+            new_recommended = get_webgpu_beginner_friendly_models(new_df, limit_per_task=5)
+            formatted_new_recommended = format_dataframe(new_recommended)
+            new_markdown = format_recommended_models_as_markdown(new_recommended)
+            return (
+                new_df,  # Update cached raw data
+                formatted_new_recommended,  # Update recommended models
+                new_markdown,  # Update markdown output
+                formatted_new_df,
+                gr.update(choices=get_unique_values(new_df, "task")),
+                gr.update(choices=get_unique_values(new_df, "platform")),
+                gr.update(choices=get_unique_values(new_df, "device")),
+                gr.update(choices=get_unique_values(new_df, "mode")),
+                gr.update(choices=get_unique_values(new_df, "dtype")),
+                gr.update(choices=get_unique_values(new_df, "status")),
+            )
+        def apply_filters(raw_df, model, task, platform, device, mode, dtype, status):
+            """Apply filters and return filtered DataFrame."""
+            # Use cached raw data instead of reloading
+            filtered = filter_data(raw_df, model, task, platform, device, mode, dtype, status)
+            return format_dataframe(filtered)
+        # Refresh button updates data and resets filters
+        refresh_btn.click(
+            fn=update_data,
+            outputs=[
+                raw_data_state,
+                recommended_table,
+                markdown_textbox,
+                results_table,
+                task_filter,
+                platform_filter,
+                device_filter,
+                mode_filter,
+                dtype_filter,
+                status_filter,
+            ],
+        )
+        # Filter inputs update the table (using cached raw data)
+        filter_inputs = [
+            raw_data_state,
+            model_filter,
+            task_filter,
+            platform_filter,
+            device_filter,
+            mode_filter,
+            dtype_filter,
+            status_filter,
+        ]
+        model_filter.change(
+            fn=apply_filters,
+            inputs=filter_inputs,
+            outputs=results_table,
+        )
+        task_filter.change(
+            fn=apply_filters,
+            inputs=filter_inputs,
+            outputs=results_table,
+        )
+        platform_filter.change(
+            fn=apply_filters,
+            inputs=filter_inputs,
+            outputs=results_table,
+        )
+        device_filter.change(
+            fn=apply_filters,
+            inputs=filter_inputs,
+            outputs=results_table,
+        )
+        mode_filter.change(
+            fn=apply_filters,
+            inputs=filter_inputs,
+            outputs=results_table,
+        )
+        dtype_filter.change(
+            fn=apply_filters,
+            inputs=filter_inputs,
+            outputs=results_table,
+        )
+        status_filter.change(
+            fn=apply_filters,
+            inputs=filter_inputs,
+            outputs=results_table,
+        )
+    return demo
+demo = create_leaderboard_ui()
+demo.launch(server_name="0.0.0.0", server_port=7861)

src/leaderboard/data_loader.py ADDED Viewed

	@@ -0,0 +1,820 @@

+"""
+Data loader module for loading benchmark results from HuggingFace Dataset.
+"""
+import json
+import logging
+from pathlib import Path
+from typing import List, Dict, Any, Optional
+from datetime import datetime
+import pandas as pd
+from huggingface_hub import snapshot_download, list_models
+logger = logging.getLogger(__name__)
+def load_benchmark_data(
+    dataset_repo: str,
+    token: Optional[str] = None,
+) -> pd.DataFrame:
+    """Load benchmark data from HuggingFace Dataset repository.
+    Args:
+        dataset_repo: HuggingFace dataset repository ID (e.g., "username/dataset-name")
+        token: HuggingFace API token (optional, for private datasets)
+    Returns:
+        DataFrame containing all benchmark results
+    """
+    if not dataset_repo:
+        return pd.DataFrame()
+    try:
+        # Download the entire repository snapshot
+        logger.info(f"Downloading dataset snapshot from {dataset_repo}...")
+        local_dir = snapshot_download(
+            repo_id=dataset_repo,
+            repo_type="dataset",
+            token=token,
+        )
+        logger.info(f"Dataset downloaded to {local_dir}")
+        # Find all JSON files in the downloaded directory
+        local_path = Path(local_dir)
+        json_files = list(local_path.rglob("*.json"))
+        if not json_files:
+            logger.warning("No JSON files found in dataset")
+            return pd.DataFrame()
+        logger.info(f"Found {len(json_files)} JSON files")
+        # Load all benchmark results
+        all_results = []
+        for file_path in json_files:
+            try:
+                with open(file_path, "r") as f:
+                    result = json.load(f)
+                if result:
+                    flattened = flatten_result(result)
+                    all_results.append(flattened)
+            except Exception as e:
+                logger.error(f"Error loading {file_path}: {e}")
+                continue
+        if not all_results:
+            return pd.DataFrame()
+        logger.info(f"Loaded {len(all_results)} benchmark results")
+        # Convert to DataFrame
+        df = pd.DataFrame(all_results)
+        # Enrich with HuggingFace model metadata
+        df = enrich_with_hf_metadata(df)
+        # Add first-timer-friendly score
+        df = add_first_timer_score(df)
+        # Sort by model name and timestamp
+        if "modelId" in df.columns and "timestamp" in df.columns:
+            df = df.sort_values(["modelId", "timestamp"], ascending=[True, False])
+        return df
+    except Exception as e:
+        logger.error(f"Error loading benchmark data: {e}")
+        return pd.DataFrame()
+def flatten_result(result: Dict[str, Any]) -> Dict[str, Any]:
+    """Flatten nested benchmark result for display.
+    The HF Dataset format is already flattened by the bench service,
+    so we just need to extract the relevant fields.
+    Args:
+        result: Raw benchmark result dictionary
+    Returns:
+        Flattened dictionary with extracted fields
+    """
+    # Convert timestamp from milliseconds to datetime
+    timestamp_ms = result.get("timestamp", 0)
+    timestamp_dt = None
+    if timestamp_ms:
+        try:
+            timestamp_dt = datetime.fromtimestamp(timestamp_ms / 1000)
+        except (ValueError, OSError):
+            timestamp_dt = None
+    # Determine actual status - if there's an error, it should be "failed"
+    status = result.get("status", "")
+    if "error" in result:
+        status = "failed"
+    flat = {
+        "id": result.get("id", ""),
+        "platform": result.get("platform", ""),
+        "modelId": result.get("modelId", ""),
+        "task": result.get("task", ""),
+        "mode": result.get("mode", ""),
+        "repeats": result.get("repeats", 0),
+        "batchSize": result.get("batchSize", 0),
+        "device": result.get("device", ""),
+        "browser": result.get("browser", ""),
+        "dtype": result.get("dtype", ""),
+        "headed": result.get("headed", False),
+        "status": status,
+        "timestamp": timestamp_dt,
+        "runtime": result.get("runtime", ""),
+        # Initialize metric fields with None (will be filled if metrics exist)
+        "load_ms_p50": None,
+        "load_ms_p90": None,
+        "first_infer_ms_p50": None,
+        "first_infer_ms_p90": None,
+        "subsequent_infer_ms_p50": None,
+        "subsequent_infer_ms_p90": None,
+    }
+    # Extract metrics if available (already at top level)
+    if "metrics" in result:
+        metrics = result["metrics"]
+        # Load time
+        if "load_ms" in metrics and "p50" in metrics["load_ms"]:
+            flat["load_ms_p50"] = metrics["load_ms"]["p50"]
+            flat["load_ms_p90"] = metrics["load_ms"]["p90"]
+        # First inference time
+        if "first_infer_ms" in metrics and "p50" in metrics["first_infer_ms"]:
+            flat["first_infer_ms_p50"] = metrics["first_infer_ms"]["p50"]
+            flat["first_infer_ms_p90"] = metrics["first_infer_ms"]["p90"]
+        # Subsequent inference time
+        if "subsequent_infer_ms" in metrics and "p50" in metrics["subsequent_infer_ms"]:
+            flat["subsequent_infer_ms_p50"] = metrics["subsequent_infer_ms"]["p50"]
+            flat["subsequent_infer_ms_p90"] = metrics["subsequent_infer_ms"]["p90"]
+    # Extract environment info (already at top level)
+    if "environment" in result:
+        env = result["environment"]
+        flat["cpuCores"] = env.get("cpuCores", 0)
+        if "memory" in env:
+            flat["memory_gb"] = env["memory"].get("deviceMemory", 0)
+    # Calculate duration
+    if "completedAt" in result and "startedAt" in result:
+        flat["duration_s"] = (result["completedAt"] - result["startedAt"]) / 1000
+    return flat
+def enrich_with_hf_metadata(df: pd.DataFrame) -> pd.DataFrame:
+    """Enrich benchmark data with HuggingFace model metadata (downloads, likes).
+    Args:
+        df: DataFrame containing benchmark results
+        token: HuggingFace API token (optional)
+    Returns:
+        DataFrame with added downloads and likes columns
+    """
+    if df.empty or "modelId" not in df.columns:
+        return df
+    # Get unique model IDs
+    model_ids = df["modelId"].unique().tolist()
+    # Fetch metadata for all models
+    model_metadata = {}
+    logger.info(f"Fetching metadata for {len(model_ids)} models from HuggingFace...")
+    try:
+        for model in list_models(filter=["transformers.js"]):
+            if model.id in model_ids:
+                model_metadata[model.id] = {
+                    "downloads": model.downloads or 0,
+                    "likes": model.likes or 0,
+                }
+                # Break early if we have all models
+                if len(model_metadata) == len(model_ids):
+                    break
+    except Exception as e:
+        logger.error(f"Error fetching HuggingFace metadata: {e}")
+    # Add metadata to dataframe
+    df["downloads"] = df["modelId"].map(lambda x: model_metadata.get(x, {}).get("downloads", 0))
+    df["likes"] = df["modelId"].map(lambda x: model_metadata.get(x, {}).get("likes", 0))
+    return df
+def add_first_timer_score(df: pd.DataFrame) -> pd.DataFrame:
+    """Add first-timer-friendly score to all rows in the dataframe.
+    The score is calculated per task, normalized from 0-100 where:
+    - Higher score = better for first-timers
+    - Based on: downloads (25%), likes (15%), load time (30%), inference time (30%)
+    Args:
+        df: DataFrame containing benchmark results
+    Returns:
+        DataFrame with added 'first_timer_score' column
+    """
+    if df.empty:
+        return df
+    # Filter only successful benchmarks
+    filtered = df[df["status"] == "completed"].copy() if "status" in df.columns else df.copy()
+    if filtered.empty:
+        # Add empty score column for failed benchmarks
+        df["first_timer_score"] = None
+        return df
+    # Check if task column exists
+    if "task" not in filtered.columns:
+        df["first_timer_score"] = None
+        return df
+    # Calculate score per task
+    for task in filtered["task"].unique():
+        task_mask = filtered["task"] == task
+        task_df = filtered[task_mask].copy()
+        if task_df.empty:
+            continue
+        # Normalize metrics within this task (0-1 scale)
+        # Downloads score (0-1, higher is better)
+        if "downloads" in task_df.columns:
+            max_downloads = task_df["downloads"].max()
+            downloads_score = task_df["downloads"] / max_downloads if max_downloads > 0 else 0
+        else:
+            downloads_score = 0
+        # Likes score (0-1, higher is better)
+        if "likes" in task_df.columns:
+            max_likes = task_df["likes"].max()
+            likes_score = task_df["likes"] / max_likes if max_likes > 0 else 0
+        else:
+            likes_score = 0
+        # Load time score (0-1, lower time is better)
+        if "load_ms_p50" in task_df.columns:
+            max_load = task_df["load_ms_p50"].max()
+            load_score = 1 - (task_df["load_ms_p50"] / max_load) if max_load > 0 else 0
+        else:
+            load_score = 0
+        # Inference time score (0-1, lower time is better)
+        if "first_infer_ms_p50" in task_df.columns:
+            max_infer = task_df["first_infer_ms_p50"].max()
+            infer_score = 1 - (task_df["first_infer_ms_p50"] / max_infer) if max_infer > 0 else 0
+        else:
+            infer_score = 0
+        # Calculate weighted score and scale to 0-100
+        weighted_score = (
+            (downloads_score * 0.25) +
+            (likes_score * 0.15) +
+            (load_score * 0.30) +
+            (infer_score * 0.30)
+        ) * 100
+        # Assign scores back to the filtered dataframe
+        filtered.loc[task_mask, "first_timer_score"] = weighted_score
+    # Merge scores back to original dataframe
+    if "first_timer_score" in filtered.columns:
+        df = df.merge(
+            filtered[["id", "first_timer_score"]],
+            on="id",
+            how="left"
+        )
+    else:
+        df["first_timer_score"] = None
+    return df
+def get_first_timer_friendly_models(df: pd.DataFrame, limit_per_task: int = 3) -> pd.DataFrame:
+    """Identify first-timer-friendly models based on popularity and performance, grouped by task.
+    A model is considered first-timer-friendly if it:
+    - Has high downloads (popular)
+    - Has fast load times (easy to start)
+    - Has fast inference times (quick results)
+    - Successfully completed benchmarks
+    Args:
+        df: DataFrame containing benchmark results
+        limit_per_task: Maximum number of models to return per task
+    Returns:
+        DataFrame with top first-timer-friendly models per task
+    """
+    if df.empty:
+        return pd.DataFrame()
+    # Filter only successful benchmarks
+    filtered = df[df["status"] == "completed"].copy() if "status" in df.columns else df.copy()
+    if filtered.empty:
+        return pd.DataFrame()
+    # Check if task column exists
+    if "task" not in filtered.columns:
+        logger.warning("Task column not found in dataframe")
+        return pd.DataFrame()
+    # Calculate first-timer-friendliness score per task
+    all_results = []
+    for task in filtered["task"].unique():
+        task_df = filtered[filtered["task"] == task].copy()
+        if task_df.empty:
+            continue
+        # Normalize metrics within this task (lower is better for times, higher is better for popularity)
+        # Downloads score (0-1, higher is better)
+        if "downloads" in task_df.columns:
+            max_downloads = task_df["downloads"].max()
+            task_df["downloads_score"] = task_df["downloads"] / max_downloads if max_downloads > 0 else 0
+        else:
+            task_df["downloads_score"] = 0
+        # Likes score (0-1, higher is better)
+        if "likes" in task_df.columns:
+            max_likes = task_df["likes"].max()
+            task_df["likes_score"] = task_df["likes"] / max_likes if max_likes > 0 else 0
+        else:
+            task_df["likes_score"] = 0
+        # Load time score (0-1, lower time is better)
+        if "load_ms_p50" in task_df.columns:
+            max_load = task_df["load_ms_p50"].max()
+            task_df["load_score"] = 1 - (task_df["load_ms_p50"] / max_load) if max_load > 0 else 0
+        else:
+            task_df["load_score"] = 0
+        # Inference time score (0-1, lower time is better)
+        if "first_infer_ms_p50" in task_df.columns:
+            max_infer = task_df["first_infer_ms_p50"].max()
+            task_df["infer_score"] = 1 - (task_df["first_infer_ms_p50"] / max_infer) if max_infer > 0 else 0
+        else:
+            task_df["infer_score"] = 0
+        # Calculate weighted first-timer-friendliness score
+        # Weights: popularity (40%), load time (30%), inference time (30%)
+        task_df["first_timer_score"] = (
+            (task_df["downloads_score"] * 0.25) +
+            (task_df["likes_score"] * 0.15) +
+            (task_df["load_score"] * 0.30) +
+            (task_df["infer_score"] * 0.30)
+        )
+        # Group by model and take best score for each model within this task
+        # Filter out NaN scores before getting idxmax
+        idx_max_series = task_df.groupby("modelId")["first_timer_score"].idxmax()
+        # Drop NaN indices
+        valid_indices = idx_max_series.dropna()
+        if valid_indices.empty:
+            continue
+        best_per_model = task_df.loc[valid_indices]
+        # Sort by first-timer score and take top N for this task
+        top_for_task = best_per_model.sort_values("first_timer_score", ascending=False).head(limit_per_task)
+        # Drop intermediate scoring columns
+        score_cols = ["downloads_score", "likes_score", "load_score", "infer_score", "first_timer_score"]
+        top_for_task = top_for_task.drop(columns=[col for col in score_cols if col in top_for_task.columns])
+        all_results.append(top_for_task)
+    if not all_results:
+        return pd.DataFrame()
+    # Combine all results
+    result = pd.concat(all_results, ignore_index=True)
+    # Sort by task name for better organization
+    if "task" in result.columns:
+        result = result.sort_values("task")
+    return result
+def get_webgpu_beginner_friendly_models(
+    df: pd.DataFrame,
+    limit_per_task: int = 5
+) -> pd.DataFrame:
+    """Get top beginner-friendly models that are WebGPU compatible, grouped by task.
+    A model is included if it:
+    - Has high first_timer_score (popular, fast to load, fast inference)
+    - Has successful WebGPU benchmark results (device=webgpu, status=completed)
+    Args:
+        df: DataFrame containing benchmark results
+        limit_per_task: Maximum number of models to return per task (default: 5)
+    Returns:
+        DataFrame with top WebGPU-compatible beginner-friendly models per task
+    """
+    if df.empty:
+        return pd.DataFrame()
+    # Filter for WebGPU benchmarks that completed successfully
+    webgpu_filter = (
+        (df["device"] == "webgpu") &
+        (df["status"] == "completed")
+    )
+    # Check if required columns exist
+    if "device" not in df.columns or "status" not in df.columns:
+        logger.warning("Required columns (device, status) not found in dataframe")
+        return pd.DataFrame()
+    filtered = df[webgpu_filter].copy()
+    if filtered.empty:
+        logger.warning("No successful WebGPU benchmarks found")
+        return pd.DataFrame()
+    # Check if required columns exist
+    if "task" not in filtered.columns or "first_timer_score" not in filtered.columns:
+        logger.warning("Required columns (task, first_timer_score) not found in filtered dataframe")
+        return pd.DataFrame()
+    # Group by task and get top models
+    all_results = []
+    for task in filtered["task"].unique():
+        task_df = filtered[filtered["task"] == task].copy()
+        if task_df.empty:
+            continue
+        # Remove rows with NaN first_timer_score
+        task_df = task_df.dropna(subset=["first_timer_score"])
+        if task_df.empty:
+            continue
+        # For each model, get the benchmark with the highest first_timer_score
+        idx_max_series = task_df.groupby("modelId")["first_timer_score"].idxmax()
+        valid_indices = idx_max_series.dropna()
+        if valid_indices.empty:
+            continue
+        best_per_model = task_df.loc[valid_indices]
+        # Sort by first_timer_score (descending) and take top N
+        top_for_task = best_per_model.sort_values(
+            "first_timer_score",
+            ascending=False
+        ).head(limit_per_task)
+        all_results.append(top_for_task)
+    if not all_results:
+        logger.warning("No models found after filtering and grouping")
+        return pd.DataFrame()
+    # Combine all results
+    result = pd.concat(all_results, ignore_index=True)
+    # Sort by task, then by first_timer_score (descending)
+    if "task" in result.columns and "first_timer_score" in result.columns:
+        result = result.sort_values(
+            ["task", "first_timer_score"],
+            ascending=[True, False]
+        )
+    return result
+def _get_usage_example(task_type: str, repo_id: str) -> tuple[str, str | None]:
+    """Get usage example code snippet for a given task type.
+    Args:
+        task_type: The task type (e.g., 'text-generation', 'image-classification')
+        repo_id: The model repository ID (e.g., 'Xenova/gpt2')
+    Returns:
+        Tuple of (code_snippet, description)
+    """
+    if task_type == "fill-mask":
+        return f"""const unmasker = await pipeline('fill-mask', '{repo_id}');
+const output = await unmasker('The goal of life is [MASK].');
+""", 'Perform masked language modelling (a.k.a. "fill-mask")'
+    elif task_type == "question-answering":
+        return f"""const answerer = await pipeline('question-answering', '{repo_id}');
+const question = 'Who was Jim Henson?';
+const context = 'Jim Henson was a nice puppet.';
+const output = await answerer(question, context);
+""", 'Run question answering'
+    elif task_type == "summarization":
+        return f"""const generator = await pipeline('summarization', '{repo_id}');
+const text = 'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, ' +
+  'and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. ' +
+  'During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest ' +
+  'man-made structure in the world, a title it held for 41 years until the Chrysler Building in New ' +
+  'York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to ' +
+  'the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the ' +
+  'Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second ' +
+  'tallest free-standing structure in France after the Millau Viaduct.';
+const output = await generator(text, {{
+  max_new_tokens: 100,
+}});
+""", 'Summarization'
+    elif task_type == "sentiment-analysis" or task_type == "text-classification":
+        return f"""const classifier = await pipeline('{task_type}', '{repo_id}');
+const output = await classifier('I love transformers!');
+""", None
+    elif task_type == "text-generation":
+        return f"""const generator = await pipeline('text-generation', '{repo_id}');
+const output = await generator('Once upon a time, there was', {{ max_new_tokens: 10 }});
+""", 'Text generation'
+    elif task_type == "text2text-generation":
+        return f"""const generator = await pipeline('text2text-generation', '{repo_id}');
+const output = await generator('how can I become more healthy?', {{
+  max_new_tokens: 100,
+}});
+""", 'Text-to-text generation'
+    elif task_type == "token-classification" or task_type == "ner":
+        return f"""const classifier = await pipeline('token-classification', '{repo_id}');
+const output = await classifier('My name is Sarah and I live in London');
+""", 'Perform named entity recognition'
+    elif task_type == "translation":
+        return f"""const translator = await pipeline('translation', '{repo_id}');
+const output = await translator('Life is like a box of chocolate.', {{
+  src_lang: '...',
+  tgt_lang: '...',
+}});
+""", 'Multilingual translation'
+    elif task_type == "zero-shot-classification":
+        return f"""const classifier = await pipeline('zero-shot-classification', '{repo_id}');
+const output = await classifier(
+    'I love transformers!',
+    ['positive', 'negative']
+);
+""", 'Zero shot classification'
+    elif task_type == "feature-extraction":
+        return f"""const extractor = await pipeline('feature-extraction', '{repo_id}');
+const output = await extractor('This is a simple test.');
+""", 'Run feature extraction'
+# Vision
+    elif task_type == "background-removal":
+        return f"""const segmenter = await pipeline('background-removal', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/portrait-of-woman_small.jpg';
+const output = await segmenter(url);
+""", 'Perform background removal'
+    elif task_type == "depth-estimation":
+        return f"""const depth_estimator = await pipeline('depth-estimation', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
+const out = await depth_estimator(url);
+""", 'Depth estimation'
+    elif task_type == "image-classification":
+        return f"""const classifier = await pipeline('image-classification', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
+const output = await classifier(url);
+""", 'Classify an image'
+    elif task_type == "image-segmentation":
+        return f"""const segmenter = await pipeline('image-segmentation', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
+const output = await segmenter(url);
+""", 'Perform image segmentation'
+    elif task_type == "image-to-image":
+        return f"""const processor = await pipeline('image-to-image', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
+const output = await processor(url);
+""", None
+    elif task_type == "object-detection":
+        return f"""const detector = await pipeline('object-detection', '{repo_id}');
+const img = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
+const output = await detector(img, {{ threshold: 0.9 }});
+""", 'Run object-detection'
+    elif task_type == "image-feature-extraction":
+        return f"""const image_feature_extractor = await pipeline('image-feature-extraction', '{repo_id}');
+const url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png';
+const features = await image_feature_extractor(url);
+""", 'Perform image feature extraction'
+# Audio
+    elif task_type == "audio-classification":
+        return f"""const classifier = await pipeline('audio-classification', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
+const output = await classifier(url);
+""", 'Perform audio classification'
+    elif task_type == "automatic-speech-recognition":
+        return f"""const transcriber = await pipeline('automatic-speech-recognition', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
+const output = await transcriber(url);
+""", 'Transcribe audio from a URL'
+    elif task_type == "text-to-audio" or task_type == "text-to-speech":
+        return f"""const synthesizer = await pipeline('text-to-speech', '{repo_id}');
+const output = await synthesizer('Hello, my dog is cute');
+""", 'Generate audio from text'
+# Multimodal
+    elif task_type == "document-question-answering":
+        return f"""const qa_pipeline = await pipeline('document-question-answering', '{repo_id}');
+const image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice.png';
+const question = 'What is the invoice number?';
+const output = await qa_pipeline(image, question);
+""", 'Answer questions about a document'
+    elif task_type == "image-to-text":
+        return f"""const captioner = await pipeline('image-to-text', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
+const output = await captioner(url);
+""", 'Generate a caption for an image'
+    elif task_type == "zero-shot-audio-classification":
+        return f"""const classifier = await pipeline('zero-shot-audio-classification', '{repo_id}');
+const audio = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/dog_barking.wav';
+const candidate_labels = ['dog', 'vaccum cleaner'];
+const scores = await classifier(audio, candidate_labels);
+""", 'Perform zero-shot audio classification'
+    elif task_type == "zero-shot-image-classification":
+        return f"""const classifier = await pipeline('zero-shot-image-classification', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
+const output = await classifier(url, ['tiger', 'horse', 'dog']);
+""", 'Zero shot image classification'
+    elif task_type == "zero-shot-object-detection":
+        return f"""const detector = await pipeline('zero-shot-object-detection', '{repo_id}');
+const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
+const candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
+const output = await detector(url, candidate_labels);
+""", 'Zero-shot object detection'
+    else:
+        logger.warning(f"No usage example found for task type: {task_type}")
+        return f"""const pipe = await pipeline('{task_type}', '{repo_id}');
+const result = await pipe('input text or data');
+console.log(result);
+""", None
+def format_recommended_models_as_markdown(df: pd.DataFrame) -> str:
+    """Format recommended WebGPU models as markdown for llms.txt embedding.
+    Args:
+        df: DataFrame containing recommended models (output from get_webgpu_beginner_friendly_models)
+    Returns:
+        Formatted markdown string
+    """
+    if df.empty:
+        return "No recommended models available."
+    markdown_lines = [
+        "# Recommended Transformers.js Models for First-Time Trials",
+        "",
+        "This guide provides curated model recommendations for each task type, selected for their:",
+        "- **Popularity**: Widely used with strong community support",
+        "- **Performance**: Fast loading and inference times",
+        "- **WebGPU Compatibility**: GPU-accelerated in modern browsers",
+        "",
+        "**Important:** These recommendations are designed for initial experimentation and learning. "
+        "Many other models are available for each task. "
+        "**You should evaluate and choose the best model for your specific use case, performance requirements, and constraints.**",
+        "",
+        "---",
+        "",
+        "## About the Model Recommendations",
+        "",
+        "The models below are selected for their popularity and ease of use, making them ideal for initial experimentation. "
+        "**This list does not cover all available models** - you should evaluate and select the best model for your specific use case and requirements.",
+        "",
+    ]
+    # Group by task
+    if "task" not in df.columns:
+        return "No task information available."
+    for task in sorted(df["task"].unique()):
+        task_df = df[df["task"] == task].copy()
+        if task_df.empty:
+            continue
+        # Add task header
+        markdown_lines.append(f"## {task.title()}")
+        markdown_lines.append("")
+        # Sort by first_timer_score descending
+        if "first_timer_score" in task_df.columns:
+            task_df = task_df.sort_values("first_timer_score", ascending=False)
+        # Get the first/best model for the usage example
+        first_row = task_df.iloc[0]
+        first_model_id = first_row.get("modelId", "")
+        # Add usage example using the top model
+        if first_model_id:
+            code_snippet, description = _get_usage_example(task, first_model_id)
+            if description:
+                markdown_lines.append(f"**Usage Example:** {description}")
+            else:
+                markdown_lines.append("**Usage Example:**")
+            markdown_lines.append("")
+            markdown_lines.append("```javascript")
+            markdown_lines.append(code_snippet.strip())
+            markdown_lines.append("```")
+            markdown_lines.append("")
+        # Add section header for model recommendations
+        markdown_lines.append("### Recommended Models for First-Time Trials")
+        markdown_lines.append("")
+        # Add each model
+        for idx, row in task_df.iterrows():
+            model_id = row.get("modelId", "Unknown")
+            score = row.get("first_timer_score", None)
+            downloads = row.get("downloads", 0)
+            likes = row.get("likes", 0)
+            load_time = row.get("load_ms_p50", None)
+            infer_time = row.get("first_infer_ms_p50", None)
+            # Model entry
+            markdown_lines.append(f"### {model_id}")
+            markdown_lines.append("")
+            # WebGPU compatibility
+            markdown_lines.append("**WebGPU Compatible:** ✅ Yes")
+            markdown_lines.append("")
+            # Metrics
+            metrics = []
+            if load_time is not None:
+                metrics.append(f"Load: {load_time:.1f}ms")
+            if infer_time is not None:
+                metrics.append(f"Inference: {infer_time:.1f}ms")
+            if downloads:
+                if downloads >= 1_000_000:
+                    downloads_str = f"{downloads / 1_000_000:.1f}M"
+                elif downloads >= 1_000:
+                    downloads_str = f"{downloads / 1_000:.1f}k"
+                else:
+                    downloads_str = str(downloads)
+                metrics.append(f"Downloads: {downloads_str}")
+            if likes:
+                metrics.append(f"Likes: {likes}")
+            if metrics:
+                markdown_lines.append(f"**Metrics:** {' | '.join(metrics)}")
+            markdown_lines.append("")
+        markdown_lines.append("---")
+        markdown_lines.append("")
+    # Add footer
+    markdown_lines.extend([
+        "## About These Recommendations",
+        "",
+        "### Selection Criteria",
+        "",
+        "Models in this guide are selected based on:",
+        "- **Popularity**: High download counts and community engagement on HuggingFace Hub",
+        "- **Performance**: Fast loading and inference times based on benchmark results",
+        "- **Compatibility**: Verified WebGPU support for GPU-accelerated browser execution",
+        "",
+        "### For Production Use",
+        "",
+        "These recommendations are optimized for first-time trials and learning. "
+        "For production applications, consider:",
+        "- Evaluating multiple models for your specific use case",
+        "- Testing with your actual data and performance requirements",
+        "- Reviewing the full benchmark results for comprehensive comparisons",
+        "- Exploring specialized models that may better fit your needs",
+        "",
+        "Visit the full leaderboard to explore all available models and their benchmark results.",
+    ])
+    return "\n".join(markdown_lines)
+def get_unique_values(df: pd.DataFrame, column: str) -> List[str]:
+    """Get unique values from a column for dropdown choices.
+    Args:
+        df: DataFrame to extract values from
+        column: Column name
+    Returns:
+        List of unique values with "All" as first item
+    """
+    if df.empty or column not in df.columns:
+        return ["All"]
+    values = df[column].dropna().unique().tolist()
+    return ["All"] + sorted([str(v) for v in values])

src/leaderboard/formatters.py ADDED Viewed

	@@ -0,0 +1,346 @@

+"""
+Formatting utilities for displaying benchmark data with emojis.
+"""
+from typing import Any, Optional
+from datetime import datetime
+def format_platform(platform: str) -> str:
+    """Format platform with emoji."""
+    emoji_map = {
+        "node": "🟢",
+        "web": "🌐",
+    }
+    emoji = emoji_map.get(platform, "")
+    return f"{emoji} {platform}" if emoji else platform
+def format_device(device: str) -> str:
+    """Format device with emoji."""
+    emoji_map = {
+        "wasm": "📦",
+        "webgpu": "⚡",
+        "cpu": "🖥️",
+        "cuda": "🎮",
+    }
+    emoji = emoji_map.get(device, "")
+    return f"{emoji} {device}" if emoji else device
+def format_browser(browser: str) -> str:
+    """Format browser with emoji."""
+    if not browser:
+        return ""
+    emoji_map = {
+        "chromium": "🔵",
+        "chrome": "🔵",
+        "firefox": "🦊",
+        "webkit": "🧭",
+        "safari": "🧭",
+    }
+    emoji = emoji_map.get(browser.lower(), "")
+    return f"{emoji} {browser}" if emoji else browser
+def format_status(status: str) -> str:
+    """Format status with emoji."""
+    emoji_map = {
+        "completed": "✅",
+        "failed": "❌",
+        "running": "🔄",
+        "pending": "⏳",
+    }
+    emoji = emoji_map.get(status, "")
+    return f"{emoji} {status}" if emoji else status
+def format_mode(mode: str) -> str:
+    """Format mode with emoji."""
+    emoji_map = {
+        "warm": "🔥",
+        "cold": "❄️",
+    }
+    emoji = emoji_map.get(mode, "")
+    return f"{emoji} {mode}" if emoji else mode
+def format_headed(headed: bool) -> str:
+    """Format headed mode with emoji."""
+    return "👁️ Yes" if headed else "No"
+def format_metric_ms(value: Optional[float], metric_type: str = "inference") -> str:
+    """Format metric in milliseconds with performance emoji.
+    Args:
+        value: Metric value in milliseconds
+        metric_type: Type of metric ('load', 'inference')
+    Returns:
+        Formatted string with emoji
+    """
+    if value is None or value == 0:
+        return "-"
+    # Different thresholds for different metric types
+    if metric_type == "load":
+        # Load time thresholds (in ms)
+        if value < 100:
+            emoji = "🚀"  # Very fast
+        elif value < 500:
+            emoji = "⚡"  # Fast
+        elif value < 2000:
+            emoji = "✅"  # Good
+        elif value < 5000:
+            emoji = "⚠️"  # Slow
+        else:
+            emoji = "🐌"  # Very slow
+    else:  # inference
+        # Inference time thresholds (in ms)
+        if value < 5:
+            emoji = "🚀"  # Very fast
+        elif value < 20:
+            emoji = "⚡"  # Fast
+        elif value < 50:
+            emoji = "✅"  # Good
+        elif value < 100:
+            emoji = "⚠️"  # Slow
+        else:
+            emoji = "🐌"  # Very slow
+    return f"{emoji} {value:.1f}ms"
+def format_duration(duration_s: Optional[float]) -> str:
+    """Format duration with emoji."""
+    if duration_s is None or duration_s == 0:
+        return "-"
+    if duration_s < 5:
+        emoji = "🚀"  # Very fast
+    elif duration_s < 15:
+        emoji = "⚡"  # Fast
+    elif duration_s < 60:
+        emoji = "✅"  # Good
+    elif duration_s < 300:
+        emoji = "⚠️"  # Slow
+    else:
+        emoji = "🐌"  # Very slow
+    return f"{emoji} {duration_s:.1f}s"
+def format_memory(memory_gb: Optional[int]) -> str:
+    """Format memory with emoji."""
+    if memory_gb is None or memory_gb == 0:
+        return "-"
+    if memory_gb >= 32:
+        emoji = "💪"  # High
+    elif memory_gb >= 16:
+        emoji = "✅"  # Good
+    elif memory_gb >= 8:
+        emoji = "⚠️"  # Medium
+    else:
+        emoji = "📉"  # Low
+    return f"{emoji} {memory_gb}GB"
+def format_cpu_cores(cores: Optional[int]) -> str:
+    """Format CPU cores with emoji."""
+    if cores is None or cores == 0:
+        return "-"
+    if cores >= 16:
+        emoji = "💪"  # Many
+    elif cores >= 8:
+        emoji = "✅"  # Good
+    elif cores >= 4:
+        emoji = "⚠️"  # Medium
+    else:
+        emoji = "📉"  # Few
+    return f"{emoji} {cores} cores"
+def format_timestamp(timestamp: Optional[datetime]) -> str:
+    """Format timestamp as datetime string.
+    Args:
+        timestamp: datetime object
+    Returns:
+        Formatted datetime string
+    """
+    if timestamp is None:
+        return "-"
+    try:
+        # Format as readable datetime
+        return timestamp.strftime("%Y-%m-%d %H:%M:%S")
+    except (ValueError, AttributeError):
+        return str(timestamp)
+def format_downloads(downloads: Optional[int]) -> str:
+    """Format downloads count with emoji.
+    Args:
+        downloads: Number of downloads
+    Returns:
+        Formatted string with emoji
+    """
+    if downloads is None or downloads == 0:
+        return "-"
+    # Format large numbers
+    if downloads >= 1_000_000:
+        formatted = f"{downloads / 1_000_000:.1f}M"
+        emoji = "🔥"  # Very popular
+    elif downloads >= 100_000:
+        formatted = f"{downloads / 1_000:.0f}k"
+        emoji = "⭐"  # Popular
+    elif downloads >= 10_000:
+        formatted = f"{downloads / 1_000:.1f}k"
+        emoji = "✨"  # Well-known
+    elif downloads >= 1_000:
+        formatted = f"{downloads / 1_000:.1f}k"
+        emoji = "📊"  # Moderate
+    else:
+        formatted = str(downloads)
+        emoji = "📈"  # New/niche
+    return f"{emoji} {formatted}"
+def format_likes(likes: Optional[int]) -> str:
+    """Format likes count with emoji.
+    Args:
+        likes: Number of likes
+    Returns:
+        Formatted string with emoji
+    """
+    if likes is None or likes == 0:
+        return "-"
+    # Format based on popularity
+    if likes >= 1000:
+        emoji = "💖"  # Very popular
+    elif likes >= 100:
+        emoji = "❤️"  # Popular
+    elif likes >= 50:
+        emoji = "💙"  # Well-liked
+    elif likes >= 10:
+        emoji = "💚"  # Moderate
+    else:
+        emoji = "🤍"  # Few likes
+    return f"{emoji} {likes}"
+def format_first_timer_score(score: Optional[float]) -> str:
+    """Format first-timer-friendly score with emoji.
+    Args:
+        score: First-timer score (0-100)
+    Returns:
+        Formatted string with emoji
+    """
+    if score is None:
+        return "-"
+    # Format based on score (0-100 scale)
+    if score >= 80:
+        emoji = "⭐⭐⭐"  # Excellent
+    elif score >= 60:
+        emoji = "⭐⭐"  # Good
+    elif score >= 40:
+        emoji = "⭐"  # Fair
+    else:
+        emoji = "·"  # Below average
+    return f"{emoji} {score:.0f}"
+def apply_formatting(df_dict: dict) -> dict:
+    """Apply emoji formatting to a benchmark result dictionary.
+    Args:
+        df_dict: Dictionary containing benchmark data (one row)
+    Returns:
+        Dictionary with formatted values
+    """
+    formatted = df_dict.copy()
+    # Format categorical fields
+    if "platform" in formatted:
+        formatted["platform"] = format_platform(formatted["platform"])
+    if "device" in formatted:
+        formatted["device"] = format_device(formatted["device"])
+    if "browser" in formatted:
+        formatted["browser"] = format_browser(formatted["browser"])
+    if "status" in formatted:
+        formatted["status"] = format_status(formatted["status"])
+    if "mode" in formatted:
+        formatted["mode"] = format_mode(formatted["mode"])
+    if "headed" in formatted:
+        formatted["headed"] = format_headed(formatted["headed"])
+    # Format metrics
+    if "load_ms_p50" in formatted:
+        formatted["load_ms_p50"] = format_metric_ms(formatted["load_ms_p50"], "load")
+    if "load_ms_p90" in formatted:
+        formatted["load_ms_p90"] = format_metric_ms(formatted["load_ms_p90"], "load")
+    if "first_infer_ms_p50" in formatted:
+        formatted["first_infer_ms_p50"] = format_metric_ms(formatted["first_infer_ms_p50"], "inference")
+    if "first_infer_ms_p90" in formatted:
+        formatted["first_infer_ms_p90"] = format_metric_ms(formatted["first_infer_ms_p90"], "inference")
+    if "subsequent_infer_ms_p50" in formatted:
+        formatted["subsequent_infer_ms_p50"] = format_metric_ms(formatted["subsequent_infer_ms_p50"], "inference")
+    if "subsequent_infer_ms_p90" in formatted:
+        formatted["subsequent_infer_ms_p90"] = format_metric_ms(formatted["subsequent_infer_ms_p90"], "inference")
+    # Format environment info
+    if "memory_gb" in formatted:
+        formatted["memory_gb"] = format_memory(formatted["memory_gb"])
+    if "cpuCores" in formatted:
+        formatted["cpuCores"] = format_cpu_cores(formatted["cpuCores"])
+    if "duration_s" in formatted:
+        formatted["duration_s"] = format_duration(formatted["duration_s"])
+    # Format timestamp
+    if "timestamp" in formatted:
+        formatted["timestamp"] = format_timestamp(formatted["timestamp"])
+    # Format HuggingFace metadata
+    if "downloads" in formatted:
+        formatted["downloads"] = format_downloads(formatted["downloads"])
+    if "likes" in formatted:
+        formatted["likes"] = format_likes(formatted["likes"])
+    # Format first-timer score
+    if "first_timer_score" in formatted:
+        formatted["first_timer_score"] = format_first_timer_score(formatted["first_timer_score"])
+    return formatted

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff