Spaces:

whitphx
/

transformersjs-performance-leaderboard-backend

Runtime error

App Files Files Community

whitphx HF Staff commited on 15 days ago

Commit

f6c05e9

1 Parent(s): 11d0b50

id serialization

Browse files

Files changed (5) hide show

.gitignore +1 -0
bench/docs/benchmark-id-format.md +175 -0
bench/src/core/benchmark-id.ts +230 -0
bench/src/server/index.ts +11 -1
bench/src/server/storage.ts +106 -8

.gitignore CHANGED Viewed

@@ -136,3 +136,4 @@ bench-web/.transformers-cache
 # Benchmark result files
 bench/benchmark-results.jsonl

 # Benchmark result files
 bench/benchmark-results.jsonl
+bench/benchmark-results/

bench/docs/benchmark-id-format.md ADDED Viewed

	@@ -0,0 +1,175 @@

+# Benchmark ID and File Organization
+## Overview
+Benchmark results are organized using a deterministic ID system that groups results with identical settings into the same file. The ID is structured hierarchically with task at the top level, followed by model ID, and finally encoded parameters.
+## Directory Structure
+Results are stored in nested directories with task as the top level:
+```
+benchmark-results/
+├── {task}/
+│   └── {org}/
+│       └── {model-name}/
+│           ├── {params1}.jsonl
+│           ├── {params2}.jsonl
+│           └── {params3}.jsonl
+```
+## ID Format
+**Full ID**: `{task}/{modelId}/{platform}_{mode}_{device}_{dtype}_{batch}_{browser}_{headed}`
+### Components
+1. **Task** (top-level directory): The transformers task
+   - Examples: `feature-extraction`, `text-classification`, `text-generation`, `sentiment-analysis`
+   - Rationale: Tasks are fundamentally different operations, so they form the primary organization
+2. **Model ID** (nested directory path): Full model identifier with organization
+   - Examples: `Xenova/all-MiniLM-L6-v2`, `meta-llama/Llama-2-7b`
+   - Preserved as-is, including slashes for directory structure
+3. **Platform**: `node` or `web`
+4. **Mode**: `warm` or `cold`
+5. **Device**: Execution device
+   - Node.js: `cpu` (default), `webgpu`
+   - Web: `webgpu` (default), `wasm`
+6. **DType** (optional): Model data type
+   - Examples: `fp32`, `fp16`, `q8`, `q4`, `int8`
+   - Omitted if not specified
+7. **Batch Size**: Always included as `b{N}`
+   - Examples: `b1`, `b4`, `b32`
+8. **Browser** (web only): Browser type
+   - Examples: `chromium`, `firefox`, `webkit`
+   - Omitted for Node.js benchmarks
+9. **Headed** (web only): Display mode
+   - Included as `headed` only if true
+   - Omitted for headless mode or Node.js benchmarks
+## Examples
+### Node.js Benchmarks
+```
+feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl
+feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_webgpu_fp16_b4.jsonl
+feature-extraction/Xenova/all-MiniLM-L6-v2/node_cold_cpu_b1.jsonl
+text-generation/meta-llama/Llama-2-7b/node_warm_cpu_q4_b1.jsonl
+```
+### Web Benchmarks
+```
+feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_b1_chromium.jsonl
+feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_q8_b1_firefox.jsonl
+feature-extraction/Xenova/distilbert-base-uncased/web_warm_webgpu_fp16_b1_chromium_headed.jsonl
+feature-extraction/Xenova/roberta-large-mnli/web_cold_wasm_b1_chromium.jsonl
+```
+### Mixed Tasks and Models
+```
+benchmark-results/
+├── feature-extraction/
+│   └── Xenova/
+│       ├── all-MiniLM-L6-v2/
+│       │   ├── node_warm_cpu_fp32_b1.jsonl
+│       │   └── web_warm_wasm_b1_chromium.jsonl
+│       └── distilbert-base-uncased/
+│           └── node_warm_webgpu_fp16_b1.jsonl
+└── text-classification/
+    └── Xenova/
+        └── distilbert-base-uncased/
+            └── node_warm_cpu_fp32_b1.jsonl
+```
+## File Format
+Each file is in JSONL (JSON Lines) format, with one benchmark result per line. This allows:
+- Appending new results without parsing the entire file
+- Streaming large result sets
+- Easy analysis with tools like `jq`
+Example:
+```jsonl
+{"id":"uuid1","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
+{"id":"uuid2","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
+{"id":"uuid3","platform":"node","modelId":"Xenova/all-MiniLM-L6-v2","task":"feature-extraction",...}
+```
+## Querying Results
+### API Endpoints
+1. **Get all results**:
+   ```bash
+   curl http://localhost:7860/api/benchmarks
+   ```
+2. **Get results by model**:
+   ```bash
+   curl "http://localhost:7860/api/benchmarks?modelId=Xenova/all-MiniLM-L6-v2"
+   ```
+3. **Get specific benchmark**:
+   ```bash
+   curl http://localhost:7860/api/benchmark/{uuid}
+   ```
+### Direct File Access
+Results can also be queried directly from the filesystem:
+```bash
+# All results for a specific task
+cat benchmark-results/feature-extraction/**/*.jsonl | jq
+# All results for a specific model across all tasks
+cat benchmark-results/*/Xenova/all-MiniLM-L6-v2/*.jsonl | jq
+# All results for a specific model and task
+cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl | jq
+# Specific configuration
+cat benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1.jsonl | jq
+# Count results per configuration
+wc -l benchmark-results/feature-extraction/Xenova/all-MiniLM-L6-v2/*.jsonl
+# Filter by device across all models
+cat benchmark-results/feature-extraction/*/*/web_*_wasm_*.jsonl | jq
+# Compare same model across different tasks
+cat benchmark-results/*/Xenova/distilbert-base-uncased/node_warm_cpu_fp32_b1.jsonl | jq
+```
+## Benefits
+1. **Task-First Organization**: Primary organization by task type, as models are typically designed for specific tasks
+2. **Grouping**: Multiple runs with identical settings are stored together in JSONL files
+3. **Easy Comparison**: Compare different models on the same task, or same model across different tasks
+4. **Organization**: Clear hierarchy: task → org → model → configurations
+5. **Readability**: Filenames are human-readable and self-documenting
+6. **Searchability**: Easy to find specific configurations using filesystem tools and glob patterns
+7. **Scalability**: Nested directory structure handles thousands of models and tasks
+8. **Model ID Preservation**: Full model IDs maintained without sanitization, preserving org/model structure
+## Configuration
+The base directory can be customized via environment variable:
+```bash
+export BENCHMARK_RESULTS_DIR=/path/to/results
+npm run server
+```
+Default: `./benchmark-results`

bench/src/core/benchmark-id.ts ADDED Viewed

	@@ -0,0 +1,230 @@

+/**
+ * Benchmark ID Generator
+ *
+ * Creates human-readable, deterministic IDs from benchmark settings that:
+ * 1. Group results with identical configurations
+ * 2. Use model ID as directory structure (e.g., "Xenova/all-MiniLM-L6-v2/...")
+ * 3. Encode other parameters as filename
+ * 4. Are sortable and searchable
+ */
+export interface BenchmarkSettings {
+  platform: "node" | "web";
+  modelId: string;
+  task: string;
+  mode: "warm" | "cold";
+  device?: string;
+  dtype?: string;
+  batchSize?: number;
+  browser?: string;
+  headed?: boolean;
+}
+/**
+ * Generate a benchmark ID path from settings
+ *
+ * Format: {task}/{modelId}/{platform}_{mode}_{device}_{dtype}_{batch}_{browser}_{headed}
+ *
+ * Examples:
+ * - "feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1"
+ * - "feature-extraction/Xenova/distilbert-base-uncased/web_warm_wasm_q8_b1_chromium"
+ * - "text-generation/meta-llama/Llama-2-7b/web_cold_webgpu_fp16_b4_firefox_headed"
+ *
+ * The path can be used to create directories and files:
+ * - Directory: {task}/{modelId}/
+ * - Filename: {platform}_{mode}_{device}_{dtype}_{batch}_{browser}_{headed}.jsonl
+ */
+export function generateBenchmarkId(settings: BenchmarkSettings): string {
+  // Task at top level
+  const task = settings.task;
+  // Model ID is preserved as-is (with slashes for directory structure)
+  const modelId = settings.modelId;
+  // Generate filename parts from other settings (excluding task since it's in the directory)
+  const filenameParts = generateFilenameParts(settings);
+  // Combine: task/modelId/filename
+  return `${task}/${modelId}/${filenameParts.join("_")}`;
+}
+/**
+ * Generate the filename parts (everything except task and model ID)
+ */
+function generateFilenameParts(settings: BenchmarkSettings): string[] {
+  const parts: string[] = [];
+  // 1. Platform (node/web)
+  parts.push(settings.platform);
+  // 2. Mode (warm/cold)
+  parts.push(settings.mode);
+  // 3. Device
+  if (settings.device) {
+    parts.push(settings.device);
+  } else if (settings.platform === "node") {
+    parts.push("cpu"); // default for node
+  } else {
+    parts.push("webgpu"); // default for web
+  }
+  // 4. DType (if specified)
+  if (settings.dtype) {
+    parts.push(settings.dtype);
+  }
+  // 5. Batch size (always include for consistency)
+  const batchSize = settings.batchSize || 1;
+  parts.push(`b${batchSize}`);
+  // 6. Browser (for web platform)
+  if (settings.platform === "web" && settings.browser) {
+    parts.push(settings.browser);
+  }
+  // 7. Headed mode (for web platform, only if true)
+  if (settings.platform === "web" && settings.headed) {
+    parts.push("headed");
+  }
+  return parts;
+}
+/**
+ * Generate a filesystem path for storing benchmark results
+ * Returns: { dir: "feature-extraction/Xenova/all-MiniLM-L6-v2", filename: "node_warm_cpu_fp32_b1.jsonl" }
+ */
+export function generateBenchmarkPath(settings: BenchmarkSettings): { dir: string; filename: string; fullPath: string } {
+  const dir = `${settings.task}/${settings.modelId}`;
+  const filenameParts = generateFilenameParts(settings);
+  const filename = `${filenameParts.join("_")}.jsonl`;
+  const fullPath = `${dir}/${filename}`;
+  return { dir, filename, fullPath };
+}
+/**
+ * Parse a benchmark ID path back into settings (best effort)
+ * This is useful for filtering and querying
+ *
+ * Example: "feature-extraction/Xenova/all-MiniLM-L6-v2/node_warm_cpu_fp32_b1"
+ */
+export function parseBenchmarkId(id: string): Partial<BenchmarkSettings> {
+  const settings: Partial<BenchmarkSettings> = {};
+  // Split into parts
+  const pathParts = id.split("/");
+  if (pathParts.length < 4) {
+    return settings; // Invalid ID - need at least task/org/model/filename
+  }
+  // Extract task (first part)
+  settings.task = pathParts[0];
+  // Extract model ID (everything from second part to second-to-last slash)
+  // Example: ["feature-extraction", "Xenova", "all-MiniLM-L6-v2", "node_warm_cpu_fp32_b1"]
+  // modelId should be "Xenova/all-MiniLM-L6-v2"
+  const lastSlashIdx = id.lastIndexOf("/");
+  const taskLength = settings.task.length + 1; // +1 for the slash
+  settings.modelId = id.substring(taskLength, lastSlashIdx);
+  // Extract filename parts (everything after the last slash)
+  const filenamePart = id.substring(lastSlashIdx + 1);
+  const parts = filenamePart.split("_");
+  if (parts.length < 3) {
+    return settings; // Invalid filename format
+  }
+  let idx = 0;
+  // Platform
+  if (parts[idx] === "node" || parts[idx] === "web") {
+    settings.platform = parts[idx] as "node" | "web";
+    idx++;
+  }
+  // Mode
+  if (idx < parts.length && (parts[idx] === "warm" || parts[idx] === "cold")) {
+    settings.mode = parts[idx] as "warm" | "cold";
+    idx++;
+  }
+  // Device (might be cpu, webgpu, wasm)
+  if (idx < parts.length && ["cpu", "webgpu", "wasm"].includes(parts[idx])) {
+    settings.device = parts[idx];
+    idx++;
+  }
+  // DType
+  if (idx < parts.length && ["fp32", "fp16", "q8", "q4", "int8", "uint8", "bnb4", "q4f16"].includes(parts[idx])) {
+    settings.dtype = parts[idx];
+    idx++;
+  }
+  // Batch size
+  if (idx < parts.length && parts[idx].startsWith("b")) {
+    const batch = parseInt(parts[idx].substring(1), 10);
+    if (!isNaN(batch)) {
+      settings.batchSize = batch;
+      idx++;
+    }
+  }
+  // Browser
+  if (idx < parts.length && ["chromium", "firefox", "webkit"].includes(parts[idx])) {
+    settings.browser = parts[idx];
+    idx++;
+  }
+  // Headed
+  if (idx < parts.length && parts[idx] === "headed") {
+    settings.headed = true;
+    idx++;
+  }
+  return settings;
+}
+/**
+ * Generate a human-readable display name from settings
+ */
+export function generateDisplayName(settings: BenchmarkSettings): string {
+  const parts: string[] = [];
+  // Model name
+  parts.push(settings.modelId);
+  // Task
+  parts.push(`(${settings.task})`);
+  // Platform and device
+  if (settings.platform === "web") {
+    parts.push(`[${settings.browser || "browser"}/${settings.device || "webgpu"}]`);
+  } else {
+    parts.push(`[node/${settings.device || "cpu"}]`);
+  }
+  // Mode
+  parts.push(settings.mode);
+  // DType if specified
+  if (settings.dtype) {
+    parts.push(settings.dtype);
+  }
+  // Batch size if not 1
+  const batchSize = settings.batchSize || 1;
+  if (batchSize !== 1) {
+    parts.push(`batch=${batchSize}`);
+  }
+  // Headed if true
+  if (settings.headed) {
+    parts.push("headed");
+  }
+  return parts.join(" ");
+}

bench/src/server/index.ts CHANGED Viewed

@@ -132,9 +132,19 @@ app.get("/api/benchmark/:id", async (c) => {
 /**
  * GET /api/benchmarks
  * Get all benchmark results from storage
  */
 app.get("/api/benchmarks", async (c) => {
-  const results = await storage.getAllResults();
   return c.json({
     total: results.length,
     results,

 /**
  * GET /api/benchmarks
  * Get all benchmark results from storage
+ * Query params:
+ * - modelId: Filter by model ID
  */
 app.get("/api/benchmarks", async (c) => {
+  const modelId = c.req.query("modelId");
+  let results;
+  if (modelId) {
+    results = await storage.getResultsByModel(modelId);
+  } else {
+    results = await storage.getAllResults();
+  }
   return c.json({
     total: results.length,
     results,

bench/src/server/storage.ts CHANGED Viewed

@@ -1,24 +1,55 @@
 import fs from "fs/promises";
 import path from "path";
 import { QueuedBenchmark } from "./queue.js";
 export class BenchmarkStorage {
-  private filePath: string;
-  constructor(filePath?: string) {
     // Use environment variable if set, otherwise fall back to default
-    const defaultPath = process.env.BENCHMARK_RESULTS_PATH || "./benchmark-results.jsonl";
-    this.filePath = path.resolve(filePath || defaultPath);
   }
   async appendResult(benchmark: QueuedBenchmark): Promise<void> {
     const line = JSON.stringify(benchmark) + "\n";
-    await fs.appendFile(this.filePath, line, "utf-8");
   }
-  async getAllResults(): Promise<QueuedBenchmark[]> {
     try {
-      const content = await fs.readFile(this.filePath, "utf-8");
       const lines = content.trim().split("\n").filter(line => line.length > 0);
       return lines.map(line => JSON.parse(line));
     } catch (error: any) {
@@ -29,14 +60,81 @@ export class BenchmarkStorage {
     }
   }
   async getResultById(id: string): Promise<QueuedBenchmark | undefined> {
     const results = await this.getAllResults();
     return results.find(r => r.id === id);
   }
   async clearResults(): Promise<void> {
     try {
-      await fs.unlink(this.filePath);
     } catch (error: any) {
       if (error.code !== "ENOENT") {
         throw error;

 import fs from "fs/promises";
 import path from "path";
 import { QueuedBenchmark } from "./queue.js";
+import { generateBenchmarkPath, type BenchmarkSettings } from "../core/benchmark-id.js";
 export class BenchmarkStorage {
+  private baseDir: string;
+  constructor(baseDir?: string) {
     // Use environment variable if set, otherwise fall back to default
+    const defaultDir = process.env.BENCHMARK_RESULTS_DIR || "./benchmark-results";
+    this.baseDir = path.resolve(baseDir || defaultDir);
+  }
+  /**
+   * Get the file path for a benchmark based on its settings
+   */
+  private getBenchmarkFilePath(benchmark: QueuedBenchmark): string {
+    const settings: BenchmarkSettings = {
+      platform: benchmark.platform,
+      modelId: benchmark.modelId,
+      task: benchmark.task,
+      mode: benchmark.mode,
+      device: benchmark.device,
+      dtype: benchmark.dtype,
+      batchSize: benchmark.batchSize,
+      browser: benchmark.browser,
+      headed: benchmark.headed,
+    };
+    const { dir, filename } = generateBenchmarkPath(settings);
+    return path.join(this.baseDir, dir, filename);
   }
   async appendResult(benchmark: QueuedBenchmark): Promise<void> {
+    const filePath = this.getBenchmarkFilePath(benchmark);
+    const dir = path.dirname(filePath);
+    // Ensure directory exists
+    await fs.mkdir(dir, { recursive: true });
+    // Append result as JSONL
     const line = JSON.stringify(benchmark) + "\n";
+    await fs.appendFile(filePath, line, "utf-8");
   }
+  /**
+   * Read all results from a specific JSONL file
+   */
+  private async readJsonlFile(filePath: string): Promise<QueuedBenchmark[]> {
     try {
+      const content = await fs.readFile(filePath, "utf-8");
       const lines = content.trim().split("\n").filter(line => line.length > 0);
       return lines.map(line => JSON.parse(line));
     } catch (error: any) {
     }
   }
+  /**
+   * Recursively find all JSONL files in the results directory
+   */
+  private async findAllJsonlFiles(dir: string): Promise<string[]> {
+    const files: string[] = [];
+    try {
+      const entries = await fs.readdir(dir, { withFileTypes: true });
+      for (const entry of entries) {
+        const fullPath = path.join(dir, entry.name);
+        if (entry.isDirectory()) {
+          // Recursively search subdirectories
+          const subFiles = await this.findAllJsonlFiles(fullPath);
+          files.push(...subFiles);
+        } else if (entry.isFile() && entry.name.endsWith(".jsonl")) {
+          files.push(fullPath);
+        }
+      }
+    } catch (error: any) {
+      if (error.code === "ENOENT") {
+        return []; // Directory doesn't exist yet
+      }
+      throw error;
+    }
+    return files;
+  }
+  async getAllResults(): Promise<QueuedBenchmark[]> {
+    const allFiles = await this.findAllJsonlFiles(this.baseDir);
+    const allResults: QueuedBenchmark[] = [];
+    for (const file of allFiles) {
+      const results = await this.readJsonlFile(file);
+      allResults.push(...results);
+    }
+    return allResults;
+  }
   async getResultById(id: string): Promise<QueuedBenchmark | undefined> {
     const results = await this.getAllResults();
     return results.find(r => r.id === id);
   }
+  /**
+   * Get all results for a specific benchmark configuration
+   */
+  async getResultsBySettings(settings: BenchmarkSettings): Promise<QueuedBenchmark[]> {
+    const { dir, filename } = generateBenchmarkPath(settings);
+    const filePath = path.join(this.baseDir, dir, filename);
+    return this.readJsonlFile(filePath);
+  }
+  /**
+   * Get all results for a specific model (all configurations)
+   */
+  async getResultsByModel(modelId: string): Promise<QueuedBenchmark[]> {
+    const modelDir = path.join(this.baseDir, modelId);
+    const allFiles = await this.findAllJsonlFiles(modelDir);
+    const allResults: QueuedBenchmark[] = [];
+    for (const file of allFiles) {
+      const results = await this.readJsonlFile(file);
+      allResults.push(...results);
+    }
+    return allResults;
+  }
   async clearResults(): Promise<void> {
     try {
+      await fs.rm(this.baseDir, { recursive: true, force: true });
     } catch (error: any) {
       if (error.code !== "ENOENT") {
         throw error;