Spaces:

yingzhac
/

Neuro_sketch_by_Gemini

Sleeping

App Files Files Community

3v324v23 commited on Sep 7

Commit

0bdbec3

1 Parent(s): 0be0be6

feat: add NNGen project under NNGen/ and ignore local secrets

Browse files

Files changed (30) hide show

.gitignore +4 -0
NNGen/.gitignore +89 -0
NNGen/AGENTS.md +38 -0
NNGen/README.md +80 -0
NNGen/app.py +9 -0
NNGen/app/__init__.py +1 -0
NNGen/app/cli.py +55 -0
NNGen/app/graph.py +82 -0
NNGen/app/llm/credentials.example.py +4 -0
NNGen/app/llm/gemini.py +788 -0
NNGen/app/nodes/__init__.py +1 -0
NNGen/app/nodes/archive.py +35 -0
NNGen/app/nodes/edit.py +80 -0
NNGen/app/nodes/gen_fusion.py +73 -0
NNGen/app/nodes/gen_generate.py +16 -0
NNGen/app/nodes/gen_labels.py +60 -0
NNGen/app/nodes/judge.py +46 -0
NNGen/app/nodes/parser.py +15 -0
NNGen/app/nodes/planner.py +15 -0
NNGen/app/nodes/prompt_gen.py +15 -0
NNGen/app/nodes/select.py +30 -0
NNGen/app/prompts.py +65 -0
NNGen/app/state.py +33 -0
NNGen/demo.ipynb +0 -0
NNGen/notebooks/demo.ipynb +151 -0
NNGen/requirements.txt +4 -0
NNGen/runtime.txt +1 -0
NNGen/scripts/gradio_app.py +103 -0
NNGen/spec/transformer.txt +32 -0
NNGen/spec/vit.txt +7 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,4 @@

+NNGen/.env
+NNGen/artifacts/
+NNGen/.venv/
+NNGen/__pycache__/

NNGen/.gitignore ADDED Viewed

	@@ -0,0 +1,89 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# pipenv
+Pipfile.lock
+# poetry
+poetry.lock
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+# Environments
+.env
+.venv
+venv/
+ENV/
+env/
+# VS Code
+.vscode/
+# PyCharm
+.idea/
+# Artifacts and outputs
+artifacts/
+artifacts/**
+*.png
+*.jpg
+*.jpeg
+# Secrets
+app/llm/credentials.py
+app/llm/credentials.json
+# Mac/Windows
+.DS_Store
+Thumbs.db

NNGen/AGENTS.md ADDED Viewed

	@@ -0,0 +1,38 @@

+# Repository Guidelines
+## Project Structure & Module Organization
+- `app/cli.py` — CLI entry point; orchestrates a full run.
+- `app/graph.py` — lightweight pipeline runner and edit loop.
+- `app/nodes/` — individual agent nodes (`parser.py`, `planner.py`, `prompt_gen.py`, `gen_generate.py` [G1 skeleton], `gen_labels.py` [G2 labels], `judge.py`, `select.py`, `edit.py`, `archive.py`). Each exposes `run(state)` or similar.
+- `app/prompts.py` — centralized prompts for parsing/planning/generation/judging/editing.
+- `app/state.py` — typed `AppState` and artifact helpers.
+- `app/llm/gemini.py` — `call_gemini(kind, **kwargs)` wrapper; uses local placeholders if no API key.
+- `spec/` — example specs (e.g., `spec/vit.txt`).
+- `artifacts/` — run outputs (time-stamped folders with `final.png`).
+## Setup, Run, and Development Commands
+- Create env: `python -m venv .venv && source .venv/bin/activate` (Windows: `./.venv/Scripts/activate`).
+- Install deps: `pip install -r requirements.txt`.
+- Configure API (choose one):
+  - Env var: `export GEMINI_API_KEY=...` (supports `.env`).
+  - File: create `app/llm/credentials.py` like `credentials.example.py`.
+- Run sample: `python -m app.cli --spec spec/vit.txt --K 3 --T 1`.
+- Models: optionally set `GEMINI_MODEL`, `GEMINI_IMAGE_MODEL`, `GEMINI_IMAGE_EDIT_MODEL`.
+## Coding Style & Naming Conventions
+- Python 3.10+, PEP8, 4-space indentation, type hints required in public APIs.
+- Files: snake_case; functions: `snake_case`; classes: `PascalCase`.
+- Nodes are pure where possible: read from `state`, return a new `state`; side effects limited to writing under `artifacts/`.
+- Centralize prompt text in `app/prompts.py`; call models via `call_gemini` only.
+## Testing Guidelines
+- No formal test suite yet. Prefer pytest with files under `tests/` named `test_*.py`.
+- Minimal integration check: run the CLI and assert a `final.png` exists in the latest `artifacts/run_*` directory.
+## Commit & Pull Request Guidelines
+- Use Conventional Commits: `feat:`, `fix:`, `docs:`, `refactor:`, `chore:`.
+- PRs must include: purpose, linked issues, how to run, and a sample spec plus path to produced artifact (e.g., `artifacts/run_YYYYmmdd_HHMMSS/final.png`).
+## Security & Configuration Tips
+- Do not commit secrets (`.env`, `app/llm/credentials.py`). Rotate keys if exposed.
+- Large outputs live in `artifacts/`; avoid committing heavy assets unless necessary.

NNGen/README.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# Multi-Agent Neural Network Diagram Generator (Skeleton) — Gemini 2.5 Flash Image
+This repository is a minimal, runnable skeleton that turns a textual NN spec into a publication-style diagram via a multi-agent pipeline:
+- Parser → Planner → Prompt-Generator → Image-Generator (G1) → Label-Generator (G2) → Judge → Selector → (Editor loop) → Archivist
+- All model calls flow through `call_gemini(...)`, making it easy to use Gemini 2.5 Flash for text and Gemini 2.5 Flash Image for images.
+Key additions in this version
+- Two-stage generation: G1 draws the geometry-only skeleton (no text), G2 overlays labels on top of the skeleton.
+- Hard violations: Judge returns actionable violations; missing labels are flagged as HARD to trigger edits reliably.
+- Parallelism: G1, G2, and Judge run in parallel; set `NNG_CONCURRENCY` (default 4).
+- Remote images by default: image generate/edit use Gemini 2.5 Flash Image models. If API is missing, the system can fall back to a local placeholder to stay runnable.
+## Quick Start
+1) Python 3.10+
+2) Install deps
+```
+pip install -r requirements.txt
+```
+3) Configure Gemini (choose one)
+- Env var: `export GEMINI_API_KEY=YOUR_KEY`
+- File: create `app/llm/credentials.py` with `GEMINI_API_KEY = "YOUR_KEY"`
+4) Run (K=candidates, T=max edit rounds)
+```
+# Text mode (spec -> image)
+python -m app.cli --mode text --spec spec/vit.txt --K 4 --T 1
+# Image mode (text + image fusion/edit)
+# Example: edit an existing diagram with a component replacement using a reference image
+python -m app.cli --mode image --base-image path/to/base.png \
+  --ref-image path/to/transformer_ref.png \
+  --instructions "Replace the UNet backbone with a Transformer (DiT); keep layout, font, and colors consistent."
+```
+Artifacts are saved under `artifacts/run_YYYYmmdd_HHMMSS/` with `final.png` as the chosen result.
+## Gemini 2.5 Flash Image in This Project
+- G1 geometry: `gen_generate.py` calls `GEMINI_IMAGE_MODEL` (Gemini 2.5 Flash Image) to render a clean, geometry-only skeleton quickly.
+- G2 labels: `gen_labels.py` uses `GEMINI_IMAGE_EDIT_MODEL` to overlay text labels onto the G1 skeleton without redrawing everything.
+- Edit loop: `edit.py` performs targeted corrections via the same image model, enabling fast, iterative refinements instead of full regenerations.
+- Why it matters: the model’s speed and editability make multi-round diagram refinement practical while preserving layout quality.
+- Fallback: if no API key is available, the pipeline remains runnable using local placeholders generated by `app/llm/gemini.py`.
+## Models
+- `GEMINI_MODEL` (default `gemini-2.5-flash`): parsing, planning, prompt generation, and judging.
+- `GEMINI_IMAGE_MODEL` (recommended `gemini-2.5-flash-image` or `gemini-2.5-flash-image-preview`): image generation (G1).
+- `GEMINI_IMAGE_EDIT_MODEL` (recommended `gemini-2.5-flash-image` or `gemini-2.5-flash-image-preview`): image editing (G2, Editor).
+Notes: If `GEMINI_API_KEY` is not set, the pipeline uses offline placeholders to remain runnable. With an API key present, you must set valid image model env vars; errors are raised if image models are unset or calls fail (no automatic local fallback).
+## Fusion Mode (Text + Image)
+- Accepts a base diagram (`--base-image`) and optional reference images (`--ref-image` repeatable) plus instructions.
+- Uses Gemini 2.5 Flash Image to compose images under textual guidance – ideal for swapping a module (e.g., UNet → Transformer) while preserving style and layout.
+- Outputs multiple fused candidates (`K`) and archives the first as `final.png`.
+## Structure
+```
+app/
+  cli.py              # CLI entry (K/T/outdir)
+  graph.py            # Orchestrator + edit loop
+  state.py            # AppState + artifacts
+  prompts.py          # Centralized prompts (parse/plan/G1/G2/judge/edit)
+  nodes/
+    parser.py, planner.py, prompt_gen.py
+    gen_generate.py   # G1 skeleton images (no text)
+    gen_labels.py     # G2 label overlay edits
+    judge.py, select.py, edit.py, archive.py
+  llm/
+    gemini.py         # Unified wrapper (API + offline fallback)
+    credentials.example.py
+spec/
+  vit.txt             # Example ViT spec (English)
+artifacts/            # Outputs per run
+```
+## Tips
+- Concurrency: `NNG_CONCURRENCY=4 python -m app.cli --spec ...`
+- Tuning: Start with `K=4, T=1`; increase `T` for more correction rounds.
+- Debug: image calls write `*.resp.txt`/`*.meta.json` alongside outputs (can be removed later if undesired).

NNGen/app.py ADDED Viewed

	@@ -0,0 +1,9 @@

+from __future__ import annotations
+# Hugging Face Spaces entrypoint for Gradio
+# Exposes a global `demo` variable that HF will serve.
+from scripts.gradio_app import app as create_app
+demo = create_app()

NNGen/app/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # empty package marker

NNGen/app/cli.py ADDED Viewed

	@@ -0,0 +1,55 @@

+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+from .graph import run_pipeline, run_fusion_pipeline
+from .state import AppState
+def main() -> None:
+    parser = argparse.ArgumentParser(description="NN Diagram Multi-Agent Pipeline")
+    parser.add_argument("--mode", type=str, choices=["text", "image"], default="text", help="'text' (spec→image) or 'image' (text+image fusion/edit)")
+    parser.add_argument("--spec", type=str, required=False, help="Path to .txt user prompt or .json spec (text mode)")
+    parser.add_argument("--K", type=int, default=4, help="Number of candidates")
+    parser.add_argument("--T", type=int, default=1, help="Max edit rounds")
+    parser.add_argument("--outdir", type=str, default="", help="Output directory (optional)")
+    parser.add_argument("--base-image", type=str, default="", help="Base image to edit (image mode)")
+    parser.add_argument("--ref-image", action="append", default=None, help="Additional reference image(s) (repeatable)")
+    parser.add_argument("--instructions", type=str, default="", help="Edit/fusion instructions (image mode)")
+    args = parser.parse_args()
+    state: AppState = {"K": int(args.K), "T": int(args.T), "outdir": args.outdir or ""}
+    if args.mode == "text":
+        if not args.spec:
+            raise SystemExit("--spec is required in text mode")
+        spec_path = Path(args.spec)
+        if not spec_path.exists():
+            raise SystemExit(f"Spec file not found: {spec_path}")
+        if spec_path.suffix.lower() == ".json":
+            state["spec"] = json.loads(spec_path.read_text())
+        else:
+            state["user_text"] = spec_path.read_text()
+        final_state = run_pipeline(state)
+    else:
+        # image fusion/edit mode
+        base_image = args.base_image.strip()
+        ref_images = args.ref_image or []
+        if not base_image and not ref_images:
+            raise SystemExit("image mode requires --base-image and/or at least one --ref-image")
+        if base_image:
+            if not Path(base_image).exists():
+                raise SystemExit(f"Base image not found: {base_image}")
+            state["base_image"] = base_image
+        valid_refs = [p for p in ref_images if p and Path(p).exists()]
+        state["ref_images"] = valid_refs
+        state["instructions"] = args.instructions or "Compose and update the figure to reflect the requested component changes while keeping overall style consistent."
+        final_state = run_fusion_pipeline(state)
+    print(f"Artifacts saved under: {final_state['outdir']}")
+if __name__ == "__main__":
+    main()

NNGen/app/graph.py ADDED Viewed

	@@ -0,0 +1,82 @@

+from __future__ import annotations
+from datetime import datetime
+from pathlib import Path
+from typing import Callable
+from .state import AppState
+from .nodes import parser, planner, prompt_gen, gen_generate, gen_labels, judge, select, edit, archive
+from .nodes import gen_fusion
+def run_pipeline(state: AppState) -> AppState:
+    # ensure outdir
+    outdir = Path(state.get("outdir") or _default_outdir())
+    outdir.mkdir(parents=True, exist_ok=True)
+    state["outdir"] = str(outdir)
+    state["round"] = int(state.get("round", 0))
+    # 1) parse → 2) plan → 3) prompts → 4) generate (skeleton) → 5) generator_2 (labels) → 6) judge → 7) select
+    state = parser.run(state)
+    state = planner.run(state)
+    state = prompt_gen.run(state)
+    state = gen_generate.run(state)
+    state = gen_labels.run(state)
+    state = judge.run(state)
+    state = select.run(state)
+    # 8) edit loop (if hard violations or any violations, and round < T)
+    T = int(state.get("T", 0))
+    while (state.get("hard_violations") or state.get("violations")) and state.get("round", 0) < T:
+        state["round"] = int(state.get("round", 0)) + 1
+        state = edit.apply_edits(state)
+        # re-judge best image
+        state = _judge_best_only(state)
+        state = select.run(state)
+    # 9) archive
+    state = archive.run(state)
+    return state
+def run_fusion_pipeline(state: AppState) -> AppState:
+    # ensure outdir
+    outdir = Path(state.get("outdir") or _default_outdir())
+    outdir.mkdir(parents=True, exist_ok=True)
+    state["outdir"] = str(outdir)
+    state["round"] = int(state.get("round", 0))
+    # Generate fused candidates from images + text instructions
+    state = gen_fusion.run(state)
+    # If we have candidates, select first as best; optionally judge later
+    if state.get("images"):
+        state["best_image"] = state["images"][0]
+    # Archive results (final.png etc.)
+    state = archive.run(state)
+    return state
+def _judge_best_only(state: AppState) -> AppState:
+    # Only score the current best image again
+    from .llm.gemini import call_gemini
+    if not state.get("best_image"):
+        return state
+    res = call_gemini("judge", image_path=state["best_image"].path, spec=state.get("spec", {}))
+    vios = list(res.get("violations", []))
+    hard = [v for v in vios if str(v).strip().lower().startswith("hard:")]
+    if not hard:
+        hard = [v for v in vios if ("labels" in str(v).lower() and "missing" in str(v).lower())]
+    state["scores"] = [{
+        "image_path": state["best_image"].path,
+        "score": float(res.get("score", 0.0)),
+        "violations": vios,
+    }]
+    state["hard_violations"] = hard
+    return state
+def _default_outdir() -> str:
+    return f"artifacts/run_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

NNGen/app/llm/credentials.example.py ADDED Viewed

	@@ -0,0 +1,4 @@


1	+ # Copy this file to credentials.py and fill in your key.
2	+
3	+ GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
4	+

NNGen/app/llm/gemini.py ADDED Viewed

	@@ -0,0 +1,788 @@

+from __future__ import annotations
+import base64
+import json
+import os
+import random
+import shutil
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+import os as _os
+try:
+    # optional local credentials file
+    from . import credentials  # type: ignore
+except Exception:
+    credentials = None  # type: ignore
+# Load .env if present to populate environment variables
+try:
+    from dotenv import load_dotenv  # type: ignore
+    load_dotenv()  # searches for .env in CWD/parents
+except Exception:
+    pass
+def _get_api_key() -> Optional[str]:
+    key = os.getenv("GEMINI_API_KEY")
+    if key:
+        return key
+    if credentials and getattr(credentials, "GEMINI_API_KEY", None):
+        return credentials.GEMINI_API_KEY  # type: ignore
+    # Optional: read from ~/.config/gemini/api_key
+    try:
+        cfg_path = Path.home() / ".config" / "gemini" / "api_key"
+        if cfg_path.exists():
+            return cfg_path.read_text().strip()
+    except Exception:
+        pass
+    return None
+def call_gemini(kind: str, **kwargs) -> Dict[str, Any]:
+    """Unified entry for Gemini calls.
+    kind: one of {"parse", "plan", "prompt_generate", "image_generate", "judge", "image_edit", "image_fuse"}
+    kwargs: payload for the corresponding action
+    If API key is missing or a call fails, falls back to deterministic local placeholders
+    so the pipeline remains runnable offline.
+    """
+    api_key = _get_api_key()
+    if not api_key:
+        # Simplified behavior: if no API key, always use local placeholders
+        return _local_placeholder(kind, **kwargs)
+    # With an API key present, call the real service and surface errors directly
+    return _real_gemini(kind, api_key=api_key, **kwargs)
+def _local_placeholder(kind: str, **kwargs) -> Dict[str, Any]:
+    # Deterministic pseudo behavior for offline usage
+    rng = random.Random(42)
+    if kind == "parse":
+        user_text = kwargs.get("user_text", "")
+        # Very rough parse: split by arrows/lines → nodes & edges
+        lines = [ln.strip() for ln in user_text.splitlines() if ln.strip()]
+        nodes = [f"N{i}:{ln[:24]}" for i, ln in enumerate(lines)] or ["Input", "Conv", "FC", "Softmax"]
+        edges = [[i, i + 1] for i in range(len(nodes) - 1)]
+        spec = {"nodes": nodes, "edges": edges, "constraints": {"arrows": "left_to_right"}}
+        return {"spec": spec}
+    if kind == "plan":
+        spec = kwargs.get("spec", {})
+        spec_text = (
+            "Neural Net Diagram\n" +
+            f"Nodes: {len(spec.get('nodes', []))}\n" +
+            f"Edges: {len(spec.get('edges', []))}\n" +
+            f"Constraints: {spec.get('constraints', {})}\n"
+        )
+        return {"spec_text": spec_text}
+    if kind == "prompt_generate":
+        K = int(kwargs.get("K", 3))
+        spec_text = kwargs.get("spec_text", "")
+        layouts = ["left-right", "top-down", "circular", "grid", "hierarchical"]
+        colors = ["blue", "green", "purple", "orange", "teal"]
+        prompts = [
+            f"Draw NN diagram ({spec_text[:40]}...) layout={layouts[i % len(layouts)]} color={colors[i % len(colors)]} seed={i}"
+            for i in range(K)
+        ]
+        return {"prompts": prompts}
+    if kind == "image_generate":
+        prompts: List[str] = kwargs.get("prompts", [])
+        outdir: str = kwargs.get("outdir", "artifacts")
+        paths: List[str] = []
+        for i, p in enumerate(prompts):
+            pth = Path(outdir) / f"candidate_{i}.png"
+            _write_placeholder_diagram(pth, with_labels=False)
+            paths.append(str(pth))
+        return {"paths": paths}
+    if kind == "judge":
+        image_path: str = kwargs.get("image_path")
+        # produce a stable pseudo-score based on filename
+        base = sum(ord(c) for c in Path(image_path).name) % 100
+        score = 0.5 + (base / 200.0)
+        # fake violations: if filename has odd index
+        violations: List[str] = []
+        try:
+            idx = int(Path(image_path).stem.split("_")[-1])
+            if idx % 2 == 1:
+                violations = ["typo: layer name", "arrow: wrong direction"]
+        except Exception:
+            pass
+        # If still skeleton (no 'labeled_' in name), mark missing labels as HARD
+        name = Path(image_path).name.lower()
+        # Heuristic for offline mode: consider labeled or edited images as having labels
+        if ("labeled_" not in name) and ("edited_" not in name):
+            violations = ["HARD: labels: missing"] + violations
+        return {"score": score, "violations": violations}
+    if kind == "image_edit":
+        image_path: str = kwargs.get("image_path")
+        out_path: str = kwargs.get("out_path")
+        instructions: str = kwargs.get("instructions", "")
+        # Extract labels from instructions if present
+        import re
+        labels = re.findall(r'\d+\s*:\s*"([^"]+)"', instructions)
+        if not labels:
+            # fallback: quoted strings
+            labels = re.findall(r'"([^"]+)"', instructions)
+        # standardize
+        labels = [l.strip() for l in labels if l.strip()]
+        _write_placeholder_diagram(Path(out_path), with_labels=True, labels=labels)
+        return {"path": out_path}
+    raise ValueError(f"Unsupported kind={kind}")
+def _write_1x1_png(path: Path) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    # 1x1 black pixel PNG
+    png_bytes = base64.b64decode(
+        b"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAA" \
+        b"AAC0lEQVR42mP8/xcAAwMB/ax4u6kAAAAASUVORK5CYII="
+    )
+    with open(path, "wb") as f:
+        f.write(png_bytes)
+def _write_placeholder_diagram(path: Path, *, with_labels: bool, labels: Optional[List[str]] = None) -> None:
+    """Generate a simple skeleton diagram (and optionally labels).
+    If Pillow is available, draw with anti-aliased vectors and real text;
+    otherwise fall back to a pure-stdlib bitmap renderer.
+    """
+    try:
+        from PIL import Image, ImageDraw, ImageFont  # type: ignore
+        _write_placeholder_diagram_pil(path, with_labels=with_labels, labels=labels)
+        return
+    except Exception:
+        pass
+    # Fallback: stdlib bitmap renderer
+    # White background, 3px black strokes, arrows, dashed group
+    import zlib, struct, binascii
+    W, H = 1200, 420
+    # initialize white canvas
+    pixels: List[List[List[int]]] = [[[255, 255, 255] for _ in range(W)] for _ in range(H)]
+    def set_px(x: int, y: int, c: tuple[int, int, int]):
+        if 0 <= x < W and 0 <= y < H:
+            pixels[y][x][0] = c[0]
+            pixels[y][x][1] = c[1]
+            pixels[y][x][2] = c[2]
+    def draw_line(x0: int, y0: int, x1: int, y1: int, c=(0, 0, 0), t: int = 3):
+        dx = abs(x1 - x0)
+        sx = 1 if x0 < x1 else -1
+        dy = -abs(y1 - y0)
+        sy = 1 if y0 < y1 else -1
+        err = dx + dy
+        while True:
+            for ox in range(-t // 2, t // 2 + 1):
+                for oy in range(-t // 2, t // 2 + 1):
+                    set_px(x0 + ox, y0 + oy, c)
+            if x0 == x1 and y0 == y1:
+                break
+            e2 = 2 * err
+            if e2 >= dy:
+                err += dy
+                x0 += sx
+            if e2 <= dx:
+                err += dx
+                y0 += sy
+    def draw_rect(x: int, y: int, w: int, h: int, c=(0, 0, 0), t: int = 3, dashed: bool = False):
+        def dash_points(x0, y0, x1, y1):
+            # Bresenham plus on/off dashes
+            dx = abs(x1 - x0)
+            sx = 1 if x0 < x1 else -1
+            dy = -abs(y1 - y0)
+            sy = 1 if y0 < y1 else -1
+            err = dx + dy
+            on = True
+            step = 0
+            period = 10
+            points = []
+            while True:
+                if on:
+                    points.append((x0, y0))
+                step = (step + 1) % period
+                if step == 0:
+                    on = not on
+                if x0 == x1 and y0 == y1:
+                    break
+                e2 = 2 * err
+                if e2 >= dy:
+                    err += dy
+                    x0 += sx
+                if e2 <= dx:
+                    err += dx
+                    y0 += sy
+            return points
+        if dashed:
+            for (x0, y0, x1, y1) in [
+                (x, y, x + w, y),
+                (x + w, y, x + w, y + h),
+                (x + w, y + h, x, y + h),
+                (x, y + h, x, y),
+            ]:
+                for px, py in dash_points(x0, y0, x1, y1):
+                    for ox in range(-t // 2, t // 2 + 1):
+                        for oy in range(-t // 2, t // 2 + 1):
+                            set_px(px + ox, py + oy, c)
+        else:
+            draw_line(x, y, x + w, y, c, t)
+            draw_line(x + w, y, x + w, y + h, c, t)
+            draw_line(x + w, y + h, x, y + h, c, t)
+            draw_line(x, y + h, x, y, c, t)
+    def draw_arrow(x0: int, y0: int, x1: int, y1: int, c=(0, 0, 0)):
+        draw_line(x0, y0, x1, y1, c, 3)
+        # simple arrow head
+        vx, vy = x1 - x0, y1 - y0
+        length = max((vx * vx + vy * vy) ** 0.5, 1.0)
+        ux, uy = vx / length, vy / length
+        # perpendicular
+        px, py = -uy, ux
+        ah = 10  # head length
+        aw = 6   # head width
+        hx, hy = int(x1 - ux * ah), int(y1 - uy * ah)
+        lx, ly = int(hx + px * aw), int(hy + py * aw)
+        rx, ry = int(hx - px * aw), int(hy - py * aw)
+        draw_line(x1, y1, lx, ly, c, 2)
+        draw_line(x1, y1, rx, ry, c, 2)
+    # layout
+    margin_x, margin_y = 60, 140
+    box_w, box_h = 220, 90
+    gap = 90
+    y = margin_y
+    xs = [margin_x + i * (box_w + gap) for i in range(4)]
+    # dashed group around middle two blocks
+    group_x = xs[1] - 20
+    group_y = y - 20
+    group_w = box_w * 2 + gap + 40
+    group_h = box_h + 40
+    draw_rect(group_x, group_y, group_w, group_h, c=(140, 140, 140), t=2, dashed=True)
+    # blocks
+    for idx, x in enumerate(xs):
+        draw_rect(x, y, box_w, box_h, c=(0, 0, 0), t=2)
+        if with_labels:
+            # draw simple 5x7 bitmap text using ASCII-only; non-ASCII removed
+            label = None
+            if labels and idx < len(labels):
+                label = labels[idx]
+            _draw_label_text(pixels, x, y, box_w, box_h, label)
+    # arrows between blocks (center-right to center-left)
+    cy = y + box_h // 2
+    for i in range(3):
+        x0 = xs[i] + box_w
+        x1 = xs[i + 1]
+        draw_arrow(x0 + 4, cy, x1 - 4, cy, c=(0, 0, 0))
+    # write PNG
+    def png_chunk(tag: bytes, data: bytes) -> bytes:
+        return struct.pack(">I", len(data)) + tag + data + struct.pack(">I", binascii.crc32(tag + data) & 0xFFFFFFFF)
+    raw = bytearray()
+    for row in pixels:
+        raw.append(0)  # filter type 0
+        for r, g, b in row:
+            raw.extend((r & 255, g & 255, b & 255))
+    comp = zlib.compress(bytes(raw), level=9)
+    sig = b"\x89PNG\r\n\x1a\n"
+    ihdr = struct.pack(">IIBBBBB", W, H, 8, 2, 0, 0, 0)  # 8-bit, truecolor RGB
+    png = sig + png_chunk(b"IHDR", ihdr) + png_chunk(b"IDAT", comp) + png_chunk(b"IEND", b"")
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with open(path, "wb") as f:
+        f.write(png)
+    # end _write_placeholder_diagram
+def _write_placeholder_diagram_pil(path: Path, *, with_labels: bool, labels: Optional[List[str]] = None) -> None:
+    from PIL import Image, ImageDraw, ImageFont  # type: ignore
+    W, H = 1280, 480
+    img = Image.new("RGB", (W, H), (255, 255, 255))
+    draw = ImageDraw.Draw(img)
+    margin_x, margin_y = 80, 160
+    box_w, box_h = 240, 110
+    gap = 110
+    y = margin_y
+    xs = [margin_x + i * (box_w + gap) for i in range(4)]
+    # dashed group
+    group_x = xs[1] - 24
+    group_y = y - 24
+    group_w = box_w * 2 + gap + 48
+    group_h = box_h + 48
+    draw.rounded_rectangle([group_x, group_y, group_x + group_w, group_y + group_h], radius=14, outline=(150, 150, 150), width=2)
+    # manual dash overlay
+    # top and bottom dashed
+    def dashed_line(p0, p1, dash=12, gaplen=8, width=2, fill=(150, 150, 150)):
+        from math import hypot
+        (x0, y0), (x1, y1) = p0, p1
+        dx, dy = x1 - x0, y1 - y0
+        length = (dx * dx + dy * dy) ** 0.5
+        if length == 0:
+            return
+        ux, uy = dx / length, dy / length
+        dist = 0.0
+        on = True
+        while dist < length:
+            l = dash if on else gaplen
+            nx0 = x0 + ux * dist
+            ny0 = y0 + uy * dist
+            nx1 = x0 + ux * min(length, dist + l)
+            ny1 = y0 + uy * min(length, dist + l)
+            if on:
+                draw.line([(nx0, ny0), (nx1, ny1)], fill=fill, width=width)
+            dist += l
+            on = not on
+    dashed_line((group_x, group_y), (group_x + group_w, group_y), width=2)
+    dashed_line((group_x, group_y + group_h), (group_x + group_w, group_y + group_h), width=2)
+    dashed_line((group_x, group_y), (group_x, group_y + group_h), width=2)
+    dashed_line((group_x + group_w, group_y), (group_x + group_w, group_y + group_h), width=2)
+    # blocks
+    for x in xs:
+        draw.rounded_rectangle([x, y, x + box_w, y + box_h], radius=12, outline=(0, 0, 0), width=3)
+    # arrows
+    cy = y + box_h // 2
+    for i in range(3):
+        x0 = xs[i] + box_w + 6
+        x1 = xs[i + 1] - 6
+        draw.line([(x0, cy), (x1, cy)], fill=(0, 0, 0), width=3)
+        # head
+        ah, aw = 14, 8
+        draw.line([(x1, cy), (x1 - ah, cy - aw)], fill=(0, 0, 0), width=3)
+        draw.line([(x1, cy), (x1 - ah, cy + aw)], fill=(0, 0, 0), width=3)
+    # labels
+    if with_labels:
+        # pick font
+        try:
+            font = ImageFont.truetype("DejaVuSans.ttf", 22)
+        except Exception:
+            font = ImageFont.load_default()
+        fallback = ["PATCH EMBED", "+ CLS + POSENC", "ENCODER xL", "CLASS HEAD"]
+        for i, x in enumerate(xs):
+            text = None
+            if labels and i < len(labels):
+                text = labels[i]
+            if not text or not isinstance(text, str) or not text.strip():
+                text = fallback[i % len(fallback)]
+            # center text
+            tw, th = draw.textlength(text, font=font), font.size + 6
+            tx = x + (box_w - tw) / 2
+            ty = y + (box_h - th) / 2
+            draw.text((tx, ty), text, fill=(40, 40, 40), font=font, align="center")
+    path.parent.mkdir(parents=True, exist_ok=True)
+    img.save(path)
+def _draw_label_text(pixels: List[List[List[int]]], x: int, y: int, w: int, h: int, label: Optional[str]) -> None:
+    # Render up to two lines of ASCII uppercase text inside a box using a 5x7 bitmap font
+    import re
+    inner_x = x + 10
+    inner_y = y + 10
+    inner_w = w - 20
+    inner_h = h - 20
+    if inner_w <= 0 or inner_h <= 0:
+        return
+    if not label:
+        # fallback: generic placeholders
+        label = "BLOCK"
+    # Normalize to ASCII upper
+    text = label.upper()
+    # allow only A-Z, 0-9, space and '-'
+    text = re.sub(r"[^A-Z0-9 \-]", " ", text)
+    if not text.strip():
+        text = "BLOCK"
+    # simple wrap by width
+    scale = 3  # enlarge 5x7 glyphs for readability
+    char_w = 6 * scale  # 5px glyph +1px spacing
+    max_chars = max(1, inner_w // char_w)
+    words = text.split()
+    lines: List[str] = []
+    line = ""
+    for wtok in words:
+        token = wtok
+        if line:
+            candidate = line + " " + token
+        else:
+            candidate = token
+        if len(candidate) <= max_chars:
+            line = candidate
+        else:
+            if line:
+                lines.append(line)
+                line = token
+            else:
+                # force cut
+                lines.append(token[:max_chars])
+                line = token[max_chars:]
+    if line:
+        lines.append(line)
+    # limit to 2 lines for clarity
+    lines = lines[:2]
+    # vertical centering
+    total_h = len(lines) * ((7 * scale) + scale) - scale
+    start_y = inner_y + (inner_h - total_h) // 2
+    for i, ln in enumerate(lines):
+        # center each line horizontally
+        line_w = len(ln) * (6 * scale)
+        start_x = inner_x + max(0, (inner_w - line_w) // 2)
+        draw_text_5x7(pixels, start_x, start_y + i * ((7 * scale) + scale), ln, color=(40, 40, 40), scale=scale)
+_FONT_5x7: Dict[str, List[str]] = {
+    'A': ["01110","10001","10001","11111","10001","10001","10001"],
+    'B': ["11110","10001","11110","10001","10001","10001","11110"],
+    'C': ["01111","10000","10000","10000","10000","10000","01111"],
+    'D': ["11110","10001","10001","10001","10001","10001","11110"],
+    'E': ["11111","10000","11110","10000","10000","10000","11111"],
+    'F': ["11111","10000","11110","10000","10000","10000","10000"],
+    'G': ["01110","10000","10000","10111","10001","10001","01110"],
+    'H': ["10001","10001","11111","10001","10001","10001","10001"],
+    'I': ["11111","00100","00100","00100","00100","00100","11111"],
+    'J': ["00111","00010","00010","00010","10010","10010","01100"],
+    'K': ["10001","10010","10100","11000","10100","10010","10001"],
+    'L': ["10000","10000","10000","10000","10000","10000","11111"],
+    'M': ["10001","11011","10101","10101","10001","10001","10001"],
+    'N': ["10001","11001","10101","10011","10001","10001","10001"],
+    'O': ["01110","10001","10001","10001","10001","10001","01110"],
+    'P': ["11110","10001","10001","11110","10000","10000","10000"],
+    'Q': ["01110","10001","10001","10001","10101","10010","01101"],
+    'R': ["11110","10001","10001","11110","10100","10010","10001"],
+    'S': ["01111","10000","10000","01110","00001","00001","11110"],
+    'T': ["11111","00100","00100","00100","00100","00100","00100"],
+    'U': ["10001","10001","10001","10001","10001","10001","01110"],
+    'V': ["10001","10001","10001","10001","01010","01010","00100"],
+    'W': ["10001","10001","10001","10101","10101","11011","10001"],
+    'X': ["10001","01010","00100","00100","01010","10001","10001"],
+    'Y': ["10001","01010","00100","00100","00100","00100","00100"],
+    'Z': ["11111","00001","00010","00100","01000","10000","11111"],
+    '0': ["01110","10001","10011","10101","11001","10001","01110"],
+    '1': ["00100","01100","00100","00100","00100","00100","01110"],
+    '2': ["01110","10001","00001","00010","00100","01000","11111"],
+    '3': ["11110","00001","00001","00110","00001","00001","11110"],
+    '4': ["00010","00110","01010","10010","11111","00010","00010"],
+    '5': ["11111","10000","11110","00001","00001","10001","01110"],
+    '6': ["00110","01000","10000","11110","10001","10001","01110"],
+    '7': ["11111","00001","00010","00100","01000","01000","01000"],
+    '8': ["01110","10001","10001","01110","10001","10001","01110"],
+    '9': ["01110","10001","10001","01111","00001","00010","01100"],
+    '-': ["00000","00000","00000","11111","00000","00000","00000"],
+    ' ': ["00000","00000","00000","00000","00000","00000","00000"],
+}
+def draw_text_5x7(pixels: List[List[List[int]]], x: int, y: int, text: str, color=(60, 60, 60), scale: int = 1) -> None:
+    max_y = len(pixels) - 1
+    max_x = len(pixels[0]) - 1 if pixels else -1
+    def set_px(px: int, py: int):
+        if 0 <= px <= max_x and 0 <= py <= max_y:
+            pixels[py][px][0] = color[0]
+            pixels[py][px][1] = color[1]
+            pixels[py][px][2] = color[2]
+    cx = x
+    for ch in text:
+        glyph = _FONT_5x7.get(ch, _FONT_5x7[' '])
+        for gy, row in enumerate(glyph):
+            for gx, bit in enumerate(row):
+                if bit == '1':
+                    # draw scaled pixel
+                    for sy in range(scale):
+                        for sx in range(scale):
+                            set_px(cx + gx * scale + sx, y + gy * scale + sy)
+        cx += 6 * scale  # 5px width + spacing
+# ------------------------- Real Google GenAI calls -------------------------
+def _real_gemini(kind: str, *, api_key: str, **kwargs) -> Dict[str, Any]:
+    import google.generativeai as genai
+    genai.configure(api_key=api_key)
+    # Model names are configurable via env with safe defaults.
+    # You can set GEMINI_MODEL to your provisioned model, e.g. "gemini-2.0-flash-exp" or "gemini-1.5-flash".
+    text_model_name = os.getenv("GEMINI_MODEL", "gemini-1.5-flash")
+    image_model_name = os.getenv("GEMINI_IMAGE_MODEL", "")  # e.g. "imagen-3.0-generate"
+    image_edit_model_name = os.getenv("GEMINI_IMAGE_EDIT_MODEL", "")  # e.g. "imagen-3.0-edit"
+    if kind == "parse":
+        from .. import prompts as _p
+        prompt = _p.build_parse_prompt()
+        user_text = kwargs.get("user_text", "")
+        model = genai.GenerativeModel(text_model_name)
+        resp = model.generate_content([prompt, user_text])
+        content = _first_text(resp)
+        data = _robust_json(content)
+        if not isinstance(data, dict):
+            raise ValueError("parse: model did not return JSON dict")
+        return {"spec": data}
+    if kind == "plan":
+        spec = kwargs.get("spec", {})
+        from .. import prompts as _p
+        prompt = _p.build_plan_prompt()
+        model = genai.GenerativeModel(text_model_name)
+        resp = model.generate_content([prompt, json.dumps(spec, ensure_ascii=False)])
+        spec_text = _first_text(resp)
+        return {"spec_text": spec_text.strip()}
+    if kind == "prompt_generate":
+        K = int(kwargs.get("K", 3))
+        spec_text = kwargs.get("spec_text", "")
+        from .. import prompts as _p
+        prompt = _p.build_promptgen_prompt(K, spec_text)
+        model = genai.GenerativeModel(text_model_name)
+        resp = model.generate_content(prompt)
+        content = _first_text(resp)
+        arr = _robust_json(content)
+        if not isinstance(arr, list):
+            # fallback: split lines
+            arr = [ln.strip("- ") for ln in content.splitlines() if ln.strip()][:K]
+        return {"prompts": arr[:K]}
+    if kind == "image_generate":
+        prompts: List[str] = kwargs.get("prompts", [])
+        outdir: str = kwargs.get("outdir", "artifacts")
+        if not image_model_name:
+            raise ValueError("GEMINI_IMAGE_MODEL is not set. Please set it to a valid image model (e.g., 'gemini-2.5-flash-image' or 'gemini-2.5-flash-image-preview').")
+        try:
+            from concurrent.futures import ThreadPoolExecutor, as_completed
+            Path(outdir).mkdir(parents=True, exist_ok=True)
+            max_workers = max(1, min(len(prompts), int(_os.getenv("NNG_CONCURRENCY", "4"))))
+            def _gen_one(i: int, p: str) -> str:
+                # new model per thread to avoid cross-thread state issues
+                mdl = genai.GenerativeModel(model_name=image_model_name)
+                resp = mdl.generate_content(p, request_options={"timeout": 180})
+                try:
+                    (Path(outdir) / f"candidate_{i}.resp.txt").write_text(str(resp))
+                except Exception:
+                    pass
+                img_bytes, mime = _first_image_bytes(resp)
+                if not img_bytes:
+                    raise ValueError("image model did not return image bytes; see *.resp.txt")
+                ext = ".png" if mime == "image/png" else ".jpg"
+                pth = Path(outdir) / f"candidate_{i}{ext}"
+                with open(pth, "wb") as f:
+                    f.write(img_bytes)
+                with open(str(pth) + ".meta.json", "w", encoding="utf-8") as mf:
+                    mf.write(json.dumps({"source": "gemini", "mime": mime, "bytes": len(img_bytes)}, ensure_ascii=False))
+                return str(pth)
+            futures = []
+            with ThreadPoolExecutor(max_workers=max_workers) as ex:
+                for i, p in enumerate(prompts):
+                    futures.append(ex.submit(_gen_one, i, p))
+            # preserve order by index
+            results = [None] * len(prompts)
+            for fut in as_completed(futures):
+                # find index by result path name
+                path = fut.result()
+                stem = Path(path).stem
+                try:
+                    idx = int(stem.split("_")[-1])
+                except Exception:
+                    idx = 0
+                results[idx] = path
+            # fill any missing in order fallback
+            paths: List[str] = [r or "" for r in results]
+            return {"paths": paths}
+        except Exception:
+            raise
+    if kind == "judge":
+        image_path: str = kwargs.get("image_path")
+        spec = kwargs.get("spec", {})
+        model = genai.GenerativeModel(text_model_name)
+        from .. import prompts as _p
+        judge_prompt = _p.build_judge_prompt()
+        image_part = _image_part_from_path(image_path)
+        resp = model.generate_content([
+            {"text": judge_prompt},
+            {"text": json.dumps(spec, ensure_ascii=False)},
+            image_part,
+        ])
+        content = _first_text(resp)
+        data = _robust_json(content)
+        if not isinstance(data, dict):
+            raise ValueError("judge: non-JSON")
+        score = float(max(0.0, min(1.0, data.get("score", 0.0))))
+        violations = list(data.get("violations", []))
+        return {"score": score, "violations": violations}
+    if kind == "image_edit":
+        image_path: str = kwargs.get("image_path")
+        out_path: str = kwargs.get("out_path")
+        instructions: str = kwargs.get("instructions", "")
+        ref_images: List[str] = list(kwargs.get("ref_images", []) or [])
+        if not image_edit_model_name:
+            raise ValueError("GEMINI_IMAGE_EDIT_MODEL is not set. Please set it to a valid image edit model (e.g., 'gemini-2.5-flash-image' or 'gemini-2.5-flash-image-preview').")
+        try:
+            model = genai.GenerativeModel(model_name=image_edit_model_name)
+            base_img = _image_part_from_path(image_path)
+            from .. import prompts as _p
+            parts = [{"text": _p.build_image_edit_prompt(instructions)}, base_img]
+            for rp in ref_images:
+                try:
+                    parts.append(_image_part_from_path(rp))
+                except Exception:
+                    continue
+            resp = model.generate_content(parts, request_options={"timeout": 120})
+            try:
+                out_p = Path(out_path)
+                out_p.parent.mkdir(parents=True, exist_ok=True)
+                (out_p.parent / (out_p.stem + ".resp.txt")).write_text(str(resp))
+            except Exception:
+                pass
+            img_bytes, mime = _first_image_bytes(resp)
+            if not img_bytes:
+                raise ValueError("image edit returned no image; see *.resp.txt for raw response")
+            ext = ".png" if mime == "image/png" else ".jpg"
+            out_p = Path(out_path)
+            out_p.parent.mkdir(parents=True, exist_ok=True)
+            with open(out_p, "wb") as f:
+                f.write(img_bytes)
+            with open(str(out_p) + ".meta.json", "w", encoding="utf-8") as mf:
+                mf.write(json.dumps({"source": "gemini", "mime": mime, "bytes": len(img_bytes)}, ensure_ascii=False))
+            return {"path": str(out_p)}
+        except Exception as e:
+            # surface error rather than fallback, per user's requirement to avoid local rendering
+            raise
+    if kind == "image_fuse":
+        # Create a new image by composing multiple reference images under textual instructions
+        out_path: str = kwargs.get("out_path")
+        instructions: str = kwargs.get("instructions", "")
+        ref_images: List[str] = list(kwargs.get("ref_images", []) or [])
+        if not image_model_name:
+            raise ValueError("GEMINI_IMAGE_MODEL is not set. Please set it to a valid image model (e.g., 'gemini-2.5-flash-image' or 'gemini-2.5-flash-image-preview').")
+        try:
+            model = genai.GenerativeModel(model_name=image_model_name)
+            from .. import prompts as _p
+            parts = [{"text": _p.build_image_fusion_prompt(instructions)}]
+            for rp in ref_images:
+                try:
+                    parts.append(_image_part_from_path(rp))
+                except Exception:
+                    continue
+            resp = model.generate_content(parts, request_options={"timeout": 120})
+            try:
+                out_p = Path(out_path)
+                out_p.parent.mkdir(parents=True, exist_ok=True)
+                (out_p.parent / (out_p.stem + ".resp.txt")).write_text(str(resp))
+            except Exception:
+                pass
+            img_bytes, mime = _first_image_bytes(resp)
+            if not img_bytes:
+                raise ValueError("image fuse returned no image; see *.resp.txt for raw response")
+            out_p = Path(out_path)
+            out_p.parent.mkdir(parents=True, exist_ok=True)
+            with open(out_p, "wb") as f:
+                f.write(img_bytes)
+            with open(str(out_p) + ".meta.json", "w", encoding="utf-8") as mf:
+                mf.write(json.dumps({"source": "gemini", "mime": mime, "bytes": len(img_bytes)}, ensure_ascii=False))
+            return {"path": str(out_p)}
+        except Exception:
+            raise
+    raise ValueError(f"Unsupported kind={kind}")
+def _first_text(resp: Any) -> str:
+    try:
+        if hasattr(resp, "text"):
+            return resp.text
+        # Some SDK versions: candidates[0].content.parts[0].text
+        cands = getattr(resp, "candidates", [])
+        if cands:
+            parts = getattr(cands[0], "content", None)
+            if parts and getattr(parts, "parts", None):
+                for part in parts.parts:
+                    if getattr(part, "text", None):
+                        return part.text
+        return str(resp)
+    except Exception:
+        return str(resp)
+def _first_image_bytes(resp: Any) -> tuple[bytes | None, str]:
+    # Try to walk through content parts and return first inline image bytes
+    try:
+        # Newer SDK: resp.candidates[].content.parts[].inline_data
+        cands = getattr(resp, "candidates", [])
+        for c in cands or []:
+            content = getattr(c, "content", None)
+            parts = getattr(content, "parts", None) if content else None
+            for part in parts or []:
+                inline = getattr(part, "inline_data", None)
+                if inline and getattr(inline, "data", None):
+                    data = inline.data
+                    mime = getattr(inline, "mime_type", "image/png")
+                    if isinstance(data, bytes):
+                        return data, mime
+                    # some versions may base64-encode
+                    try:
+                        return base64.b64decode(data), mime
+                    except Exception:
+                        pass
+        return None, ""
+    except Exception:
+        return None, ""
+def _image_part_from_path(path: str) -> Dict[str, Any]:
+    # google-generativeai accepts dict with mime_type and data bytes for images
+    p = Path(path)
+    mime = "image/png" if p.suffix.lower() == ".png" else "image/jpeg"
+    data = p.read_bytes()
+    return {"mime_type": mime, "data": data}
+def _robust_json(text: str) -> Any:
+    # Try parse whole, then attempt to extract first {...} or [...] block
+    try:
+        return json.loads(text)
+    except Exception:
+        pass
+    # crude extraction
+    start = text.find("{")
+    end = text.rfind("}")
+    if start != -1 and end != -1 and end > start:
+        try:
+            return json.loads(text[start : end + 1])
+        except Exception:
+            pass
+    start = text.find("[")
+    end = text.rfind("]")
+    if start != -1 and end != -1 and end > start:
+        try:
+            return json.loads(text[start : end + 1])
+        except Exception:
+            pass
+    return {}

NNGen/app/nodes/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from . import parser, planner, prompt_gen, gen_generate, gen_labels, judge, select, edit, archive

NNGen/app/nodes/archive.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from __future__ import annotations
+import json
+from pathlib import Path
+from ..state import AppState
+def run(state: AppState) -> AppState:
+    outdir = Path(state["outdir"])  # ensured by CLI
+    outdir.mkdir(parents=True, exist_ok=True)
+    # dump spec
+    if state.get("spec"):
+        (outdir / "spec.json").write_text(json.dumps(state["spec"], ensure_ascii=False, indent=2))
+    if state.get("spec_text"):
+        (outdir / "spec.txt").write_text(state["spec_text"])
+    # dump prompts
+    if state.get("prompts"):
+        (outdir / "prompts.json").write_text(json.dumps(state["prompts"], ensure_ascii=False, indent=2))
+    # dump scores
+    if state.get("scores"):
+        (outdir / "scores.json").write_text(json.dumps(state["scores"], ensure_ascii=False, indent=2))
+    # copy/rename final image
+    if state.get("best_image"):
+        src = Path(state["best_image"].path)
+        dst = outdir / "final.png"
+        if src.exists():
+            dst.write_bytes(src.read_bytes())
+    return state

NNGen/app/nodes/edit.py ADDED Viewed

	@@ -0,0 +1,80 @@

+from __future__ import annotations
+from pathlib import Path
+from typing import Dict
+from ..llm.gemini import call_gemini
+from ..state import AppState
+def _labels_from_spec(state: AppState) -> list[str]:
+    spec = state.get("spec", {}) or {}
+    raw_nodes = spec.get("nodes", []) or []
+    labels: list[str] = []
+    for n in raw_nodes:
+        label = None
+        if isinstance(n, str):
+            # strip index prefixes like "N0: ..."
+            parts = n.split(":", 1)
+            label = parts[1].strip() if len(parts) == 2 else n.strip()
+        elif isinstance(n, dict):
+            label = n.get("label") or n.get("name") or n.get("id")
+        if label:
+            labels.append(str(label))
+    # dedupe sequential exact repeats only when later mapping by order still makes sense
+    return labels
+def _ascii_friendly_labels(state: AppState) -> list[str]:
+    labels = _labels_from_spec(state)
+    def is_ascii(s: str) -> bool:
+        try:
+            s.encode('ascii')
+            return True
+        except Exception:
+            return False
+    if not labels or sum(1 for l in labels if is_ascii(l)) == 0:
+        return [
+            "PATCH EMBEDDING",
+            "CLS + POSENC",
+            "ENCODER xL",
+            "CLASS HEAD",
+        ]
+    return labels
+def plan_edits(state: AppState) -> str:
+    hard_violations = [str(v) for v in state.get("hard_violations", [])]
+    violations = [str(v) for v in state.get("violations", [])]
+    # If judge reports missing labels (prefer HARD), provide an add-labels instruction
+    hv = hard_violations or violations
+    if any(("labels" in v.lower() and "missing" in v.lower()) for v in hv):
+        labels = _labels_from_spec(state)
+        numbered = "\n".join([f"{i+1}: \"{lbl}\"" for i, lbl in enumerate(labels)]) or "(no labels provided)"
+        return (
+            "Add text labels INSIDE each rectangular block without changing geometry, arrows, spacing, sizes, or colors. "
+            "Map labels in left→right, top→bottom order; reuse identical labels for repeated blocks. "
+            "Use a clean sans-serif font in solid black or dark gray, consistent size.\n"
+            f"Labels list:\n{numbered}"
+        )
+    # Default: targeted fixes based on judge violations, but always provide labels list to preserve text in offline mode
+    fixes = "; ".join(violations) if violations else "typos, arrow direction, spacing/legibility, and style compliance"
+    labels = _ascii_friendly_labels(state)
+    numbered = "\n".join([f"{i+1}: \"{lbl}\"" for i, lbl in enumerate(labels)]) or "(no labels provided)"
+    return (
+        f"Fix the following issues precisely: {fixes}. "
+        "Do not move or reshape elements. Only adjust text (content/position/size), arrow direction styles, and minimal styling to reach paper standards.\n"
+        f"Labels list:\n{numbered}"
+    )
+def apply_edits(state: AppState) -> AppState:
+    if not state.get("best_image"):
+        return state
+    src = state["best_image"].path
+    out_path = str(Path(state["outdir"]) / f"edited_round_{state.get('round', 0)}.png")
+    _ = call_gemini("image_edit", image_path=src, out_path=out_path, instructions=plan_edits(state))
+    # replace best_image with edited one
+    state["best_image"].path = out_path  # type: ignore
+    return state

NNGen/app/nodes/gen_fusion.py ADDED Viewed

	@@ -0,0 +1,73 @@

+from __future__ import annotations
+import os
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from pathlib import Path
+from typing import Dict, List
+from ..llm.gemini import call_gemini
+from ..state import AppState, ImageArtifact
+def run(state: AppState) -> AppState:
+    """Generate K fused candidates by composing reference images under instructions.
+    Expected in state:
+    - outdir: str
+    - K: int
+    - base_image: optional path (treated as first ref image if present)
+    - ref_images: optional list[str]
+    - instructions: str
+    """
+    outdir = Path(state["outdir"])  # ensured by graph
+    K = int(state.get("K", 3))
+    instructions: str = str(state.get("instructions", "")).strip()
+    # prepare reference list
+    refs: List[str] = []
+    if state.get("base_image"):
+        refs.append(str(state["base_image"]))
+    for r in state.get("ref_images", []) or []:
+        if r and str(r) not in refs:
+            refs.append(str(r))
+    if not refs:
+        raise ValueError("Fusion mode requires at least one reference image (base or ref_images)")
+    max_workers = max(1, min(K, int(os.getenv("NNG_CONCURRENCY", "4"))))
+    def _fuse_one(i: int) -> str:
+        out_path = str(outdir / f"fused_candidate_{i}.png")
+        # Use image_edit if base image is provided; otherwise image_fuse
+        if state.get("base_image"):
+            call_gemini(
+                "image_edit",
+                image_path=str(state["base_image"]),
+                out_path=out_path,
+                instructions=f"Variant {i}: {instructions}",
+                ref_images=[p for p in refs if p != str(state["base_image"])],
+            )
+        else:
+            call_gemini(
+                "image_fuse",
+                out_path=out_path,
+                instructions=f"Variant {i}: {instructions}",
+                ref_images=refs,
+            )
+        return out_path
+    paths: List[str] = [""] * K
+    with ThreadPoolExecutor(max_workers=max_workers) as ex:
+        futures = [ex.submit(_fuse_one, i) for i in range(K)]
+        for fut in as_completed(futures):
+            p = fut.result()
+            try:
+                idx = int(Path(p).stem.split("_")[-1])
+            except Exception:
+                idx = 0
+            paths[idx] = p
+    images = [ImageArtifact(prompt=instructions, path=pth) for pth in paths if pth]
+    state["images"] = images
+    return state

NNGen/app/nodes/gen_generate.py ADDED Viewed

	@@ -0,0 +1,16 @@

+from __future__ import annotations
+from typing import Dict, List
+from ..llm.gemini import call_gemini
+from ..state import AppState, ImageArtifact
+def run(state: AppState) -> AppState:
+    prompts: List[str] = state.get("prompts", [])
+    res: Dict = call_gemini("image_generate", prompts=prompts, outdir=state["outdir"])
+    paths = res.get("paths", [])
+    images = [ImageArtifact(prompt=p, path=pth) for p, pth in zip(prompts, paths)]
+    state["images"] = images
+    return state

NNGen/app/nodes/gen_labels.py ADDED Viewed

	@@ -0,0 +1,60 @@

+from __future__ import annotations
+from pathlib import Path
+from typing import List
+import os
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from ..llm.gemini import call_gemini
+from ..state import AppState, ImageArtifact
+from .edit import _labels_from_spec  # reuse label extraction
+def run(state: AppState) -> AppState:
+    images: List[ImageArtifact] = state.get("images", []) or []
+    if not images:
+        return state
+    labels = _labels_from_spec(state)
+    # Fallback to ASCII-friendly defaults if labels are missing or mostly non-ASCII
+    def _is_mostly_ascii(s: str) -> bool:
+        try:
+            s.encode('ascii')
+            return True
+        except Exception:
+            return False
+    if not labels or sum(1 for l in labels if _is_mostly_ascii(l)) == 0:
+        labels = [
+            "PATCH EMBEDDING",
+            "CLS + POSENC",
+            "ENCODER xL",
+            "CLASS HEAD",
+        ]
+    numbered = "\n".join([f"{i+1}: \"{lbl}\"" for i, lbl in enumerate(labels)]) or "(no labels provided)"
+    instructions = (
+        "Add labels INSIDE each rectangular block. Do not move/resize/add/remove shapes or arrows; keep layout, spacing, and colors unchanged. "
+        "Map labels in left→right, top→bottom order; reuse identical labels for repeated blocks. Use each label string exactly as given (no translation or paraphrase). "
+        "Typography: clean sans-serif, readable size, centered within blocks; at most two short lines; avoid covering arrows; no legends or titles. "
+        "If block count ≠ label count, do NOT add/remove shapes; place labels sequentially on existing blocks.\n"
+        f"Labels list:\n{numbered}"
+    )
+    outdir = Path(state["outdir"]) if state.get("outdir") else Path("artifacts")
+    max_workers = max(1, min(len(images), int(os.getenv("NNG_CONCURRENCY", "4"))))
+    results: List[ImageArtifact | None] = [None] * len(images)
+    def _label_one(i: int, im: ImageArtifact) -> tuple[int, str]:
+        src = im.path
+        out_path = str(outdir / f"labeled_candidate_{i}.png")
+        _ = call_gemini("image_edit", image_path=src, out_path=out_path, instructions=instructions)
+        return i, out_path
+    with ThreadPoolExecutor(max_workers=max_workers) as ex:
+        futures = [ex.submit(_label_one, i, im) for i, im in enumerate(images)]
+        for fut in as_completed(futures):
+            i, out_path = fut.result()
+            results[i] = ImageArtifact(prompt=images[i].prompt, path=out_path, meta={"stage": "labels"})
+    state["images"] = [im for im in results if im is not None]
+    return state

NNGen/app/nodes/judge.py ADDED Viewed

	@@ -0,0 +1,46 @@

+from __future__ import annotations
+from typing import Dict, List
+import os
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from ..llm.gemini import call_gemini
+from ..state import AppState, ScoreItem
+def run(state: AppState) -> AppState:
+    images = list(state.get("images", []))
+    if not images:
+        state["scores"] = []
+        return state
+    max_workers = max(1, min(len(images), int(os.getenv("NNG_CONCURRENCY", "4"))))
+    results: List[ScoreItem | None] = [None] * len(images)
+    def _judge_one(i: int) -> tuple[int, Dict]:
+        im = images[i]
+        res: Dict = call_gemini("judge", image_path=im.path, spec=state.get("spec", {}))
+        return i, res
+    with ThreadPoolExecutor(max_workers=max_workers) as ex:
+        futures = [ex.submit(_judge_one, i) for i in range(len(images))]
+        for fut in as_completed(futures):
+            try:
+                i, res = fut.result()
+                im = images[i]
+                results[i] = {
+                    "image_path": im.path,
+                    "score": float(res.get("score", 0.0)),
+                    "violations": list(res.get("violations", [])),
+                }
+            except Exception as e:
+                im = images[futures.index(fut)] if fut in futures else None
+                path = im.path if im else ""
+                results[i] = {
+                    "image_path": path,
+                    "score": 0.0,
+                    "violations": [f"judge error: {e}"]
+                }
+    state["scores"] = [s for s in results if s is not None]
+    return state

NNGen/app/nodes/parser.py ADDED Viewed

	@@ -0,0 +1,15 @@

+from __future__ import annotations
+from typing import Dict
+from ..llm.gemini import call_gemini
+from ..state import AppState
+def run(state: AppState) -> AppState:
+    if state.get("spec"):
+        return state
+    res: Dict = call_gemini("parse", user_text=state.get("user_text", ""))
+    state["spec"] = res.get("spec", {})
+    return state

NNGen/app/nodes/planner.py ADDED Viewed

	@@ -0,0 +1,15 @@

+from __future__ import annotations
+from typing import Dict
+from ..llm.gemini import call_gemini
+from ..state import AppState
+def run(state: AppState) -> AppState:
+    if state.get("spec_text"):
+        return state
+    res: Dict = call_gemini("plan", spec=state.get("spec", {}))
+    state["spec_text"] = res.get("spec_text", "")
+    return state

NNGen/app/nodes/prompt_gen.py ADDED Viewed

	@@ -0,0 +1,15 @@

+from __future__ import annotations
+from typing import Dict
+from ..llm.gemini import call_gemini
+from ..state import AppState
+def run(state: AppState) -> AppState:
+    if state.get("prompts"):
+        return state
+    res: Dict = call_gemini("prompt_generate", spec_text=state.get("spec_text", ""), K=state.get("K", 3))
+    state["prompts"] = res.get("prompts", [])
+    return state

NNGen/app/nodes/select.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from __future__ import annotations
+from pathlib import Path
+from typing import List
+from ..state import AppState, ImageArtifact
+def run(state: AppState) -> AppState:
+    scores = sorted(state.get("scores", []), key=lambda s: s["score"], reverse=True)
+    if not scores:
+        return state
+    best = scores[0]
+    best_img_path = best["image_path"]
+    # find the corresponding ImageArtifact
+    best_image: ImageArtifact | None = None
+    for im in state.get("images", []):
+        if im.path == best_img_path:
+            best_image = im
+            break
+    vios = [str(v) for v in best.get("violations", [])]
+    # Identify hard violations: explicit HARD marker or labels missing heuristic
+    hard = [v for v in vios if v.strip().lower().startswith("hard:")]
+    if not hard:
+        hard = [v for v in vios if ("labels" in v.lower() and "missing" in v.lower())]
+    state["best_image"] = best_image
+    state["violations"] = vios
+    state["hard_violations"] = hard
+    return state

NNGen/app/prompts.py ADDED Viewed

	@@ -0,0 +1,65 @@

+from __future__ import annotations
+def build_parse_prompt() -> str:
+    return (
+        "You are a strict parser for neural network architecture specs. "
+        "Input is natural language. Return ONLY a JSON object with fields: "
+        "nodes: string[], edges: [fromIndex, toIndex][], constraints: object. "
+        "No prose."
+    )
+def build_plan_prompt() -> str:
+    return (
+        "Given a structured NN spec (JSON), produce a concise, fillable template text "
+        "that preserves nodes, edges, and key constraints for diagram rendering. "
+        "Emphasize left-to-right flow, explicit layer counts, and unambiguous labels."
+    )
+def build_promptgen_prompt(K: int, spec_text: str) -> str:
+    # Stage G1: lighter, cleaner skeleton-only prompts (no hard stylistic numbers)
+    return (
+        "Create K concise prompts for an image model to draw ONLY the skeleton of a neural network diagram (no text).\n"
+        "Aim for a clean paper-figure look: flat 2D, simple shapes, balanced margins, and a calm palette. Use rectangles for modules and clear left→right arrows.\n"
+        "If the spec implies repetition (e.g., Encoder × L), you may show a dashed grouping around the repeated blocks. Avoid flashy effects (no 3D or heavy glow).\n"
+        "Return ONLY a JSON array of exactly K strings; each item is one full prompt for image generation.\n"
+        f"K={K}.\n"
+        f"Spec (summary):\n{spec_text}\n"
+        "Each prompt must mention: 'skeleton-only, no text'."
+    )
+def build_judge_prompt() -> str:
+    # Judge content & style, optimized for two-stage (skeleton→labels) flow
+    return (
+        "You are a strict publication-figure QA judge. Given a spec (JSON) and a NN diagram image, "
+        "evaluate (A) Content correctness and (B) Paper-style compliance.\n"
+        "(A) Content (0.6): required modules present; edges/arrows reflect correct order; arrows left→right; labels exist and are spelled correctly; "
+        "layer count L indicated when applicable. If the image has no labels, include violation EXACTLY 'HARD: labels: missing'.\n"
+        "(B) Style (0.4): flat 2D; white background; minimal color (black/gray + ≤2 accents); no gradients/3D/glow/shadows/neon; "
+        "consistent stroke width; consistent sans-serif font; adequate spacing; dashed boxes for repeated blocks; high print readability.\n"
+        "Return ONLY strict JSON: {score: number in [0,1], violations: string[]}. Violations must be concrete and actionable."
+    )
+def build_image_edit_prompt(instructions: str) -> str:
+    # G2 and later edits: add/adjust labels only; keep geometry fixed (light constraints)
+    base = (
+        "Add or adjust labels INSIDE each block, without changing any shapes, arrows, layout, spacing, or colors. "
+        "Keep a clean, readable look: flat 2D, simple sans-serif font, good contrast, and consistent size across blocks. "
+        "Center labels within blocks; use at most two short lines; avoid covering arrows; do not add legends or titles. "
+        "Use each label string exactly as provided (no translation or paraphrase). "
+    )
+    return base + f"Instructions: {instructions}"
+def build_image_fusion_prompt(instructions: str) -> str:
+    # Compose multiple images guided by text while preserving key visual constraints
+    return (
+        "Compose a new, clean technical diagram by integrating the following reference images. "
+        "Preserve the overall paper-style look: flat 2D, white background, minimal color, consistent line width, and sans-serif text. "
+        "Follow the instructions precisely; keep geometry aligned and readable; avoid extra decorations. "
+        f"Instructions: {instructions}"
+    )

NNGen/app/state.py ADDED Viewed

	@@ -0,0 +1,33 @@

+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, TypedDict
+class ScoreItem(TypedDict):
+    image_path: str
+    score: float
+    violations: List[str]
+@dataclass
+class ImageArtifact:
+    prompt: str
+    path: str
+    meta: Dict[str, Any] = field(default_factory=dict)
+class AppState(TypedDict, total=False):
+    user_text: str
+    spec: Dict[str, Any]
+    spec_text: str
+    K: int
+    T: int
+    round: int
+    prompts: List[str]
+    images: List[ImageArtifact]
+    scores: List[ScoreItem]
+    best_image: Optional[ImageArtifact]
+    violations: List[str]
+    hard_violations: List[str]
+    outdir: str

NNGen/demo.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

NNGen/notebooks/demo.ipynb ADDED Viewed

	@@ -0,0 +1,151 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0ce88361",
+   "metadata": {},
+   "source": [
+    "# NNGen Demo — Gemini 2.5 Flash Image\n",
+    "\n",
+    "Interactive demo to generate a neural network diagram from a natural language prompt.\n",
+    "- Uses the multi-agent pipeline (`parser → planner → prompt-gen → G1 → G2 → judge → select → edit loop → archive`).\n",
+    "- All model calls go through `app.llm.gemini.call_gemini`. If `GEMINI_API_KEY` is not set, the pipeline falls back to local placeholders so the demo still runs.\n",
+    "- For image generation/editing, set `GEMINI_IMAGE_MODEL`/`GEMINI_IMAGE_EDIT_MODEL` (e.g., `gemini-2.5-flash-image` or `gemini-2.5-flash-image-preview`).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a0f6490c",
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ModuleNotFoundError",
+     "evalue": "No module named 'app'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mModuleNotFoundError\u001b[39m                       Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m      1\u001b[39m \u001b[38;5;66;03m# Imports\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mapp\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mgraph\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m run_pipeline\n\u001b[32m      3\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mapp\u001b[39;00m\u001b[34;01m.\u001b[39;00m\u001b[34;01mstate\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m AppState\n\u001b[32m      4\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mpathlib\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m Path\n",
+      "\u001b[31mModuleNotFoundError\u001b[39m: No module named 'app'"
+     ]
+    }
+   ],
+   "source": [
+    "# Imports\n",
+    "from app.graph import run_pipeline\n",
+    "from app.state import AppState\n",
+    "from pathlib import Path\n",
+    "from IPython.display import Image, display\n",
+    "import os, json\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "722a09d3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Optional: configure models here if not set in environment (.env is supported).\n",
+    "# os.environ.setdefault(\"GEMINI_MODEL\", \"gemini-2.5-flash\")\n",
+    "# os.environ.setdefault(\"GEMINI_IMAGE_MODEL\", \"gemini-2.5-flash-image\")\n",
+    "# os.environ.setdefault(\"GEMINI_IMAGE_EDIT_MODEL\", \"gemini-2.5-flash-image\")\n",
+    "print(\"GEMINI_MODEL=\", os.getenv(\"GEMINI_MODEL\", \"(default)\"))\n",
+    "print(\"GEMINI_IMAGE_MODEL=\", os.getenv(\"GEMINI_IMAGE_MODEL\", \"(default)\"))\n",
+    "print(\"GEMINI_IMAGE_EDIT_MODEL=\", os.getenv(\"GEMINI_IMAGE_EDIT_MODEL\", \"(default)\"))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "61aacffe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Enter a natural language NN spec (leave blank to use the sample in spec/vit.txt).\n",
+    "print(\"Enter your NN spec prompt (blank for sample):\")\n",
+    "user_text = input().strip()\n",
+    "if not user_text:\n",
+    "    user_text = Path('spec/vit.txt').read_text()\n",
+    "\n",
+    "# Number of candidates (K) and max edit rounds (T)\n",
+    "K = 4\n",
+    "T = 1\n",
+    "print(f\"Configured: K={K}, T={T}\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f6c28bb1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run the multi-agent pipeline\n",
+    "state: AppState = {\n",
+    "    'K': K,\n",
+    "    'T': T,\n",
+    "    'user_text': user_text,\n",
+    "    'outdir': ''  # use timestamped default\n",
+    "}\n",
+    "final_state = run_pipeline(state)\n",
+    "print('Artifacts directory:', final_state['outdir'])\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a402a108",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Display the final image\n",
+    "final_path = Path(final_state['outdir']) / 'final.png'\n",
+    "if final_path.exists():\n",
+    "    display(Image(filename=str(final_path)))\n",
+    "else:\n",
+    "    print('final.png not found at', final_path)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7518a91",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# (Optional) Inspect outputs: spec and scoring\n",
+    "spec_txt = Path(final_state['outdir']) / 'spec.txt'\n",
+    "scores_json = Path(final_state['outdir']) / 'scores.json'\n",
+    "if spec_txt.exists():\n",
+    "    print('--- spec.txt ---')\n",
+    "    print(spec_txt.read_text())\n",
+    "if scores_json.exists():\n",
+    "    print('--- scores.json ---')\n",
+    "    print(scores_json.read_text())\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

NNGen/requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+langgraph>=0.2.0
+google-generativeai>=0.7.0
+python-dotenv>=1.0.1
+gradio>=4.32.0

NNGen/runtime.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ python-3.10

NNGen/scripts/gradio_app.py ADDED Viewed

	@@ -0,0 +1,103 @@

+from __future__ import annotations
+import os
+import shutil
+from pathlib import Path
+from typing import List, Tuple
+import gradio as gr
+from app.graph import run_pipeline, run_fusion_pipeline
+from app.state import AppState
+def _zip_outdir(outdir: str) -> str:
+    out = Path(outdir)
+    if not out.exists():
+        return ""
+    zip_path = str(out) + ".zip"
+    # remove if exists
+    try:
+        if Path(zip_path).exists():
+            Path(zip_path).unlink()
+    except Exception:
+        pass
+    shutil.make_archive(str(out), "zip", root_dir=str(out))
+    return zip_path
+def run_text_mode(user_text: str, K: int, T: int, make_zip: bool) -> Tuple[str, List[str], str, str]:
+    state: AppState = {"K": int(K), "T": int(T), "user_text": user_text or "", "outdir": ""}
+    final_state = run_pipeline(state)
+    outdir = final_state["outdir"]
+    # Collect candidates if present
+    candidates = [im.path for im in (final_state.get("images") or [])]
+    final_img = str(Path(outdir) / "final.png")
+    zip_path = _zip_outdir(outdir) if make_zip else ""
+    return final_img, candidates, outdir, zip_path
+def run_image_mode(base_image, ref_images, instructions: str, K: int, make_zip: bool) -> Tuple[str, List[str], str, str]:
+    state: AppState = {"K": int(K), "T": 0, "outdir": "", "instructions": instructions or ""}
+    if base_image is not None:
+        state["base_image"] = base_image if isinstance(base_image, str) else base_image.name
+    refs: List[str] = []
+    for f in (ref_images or []):
+        p = f if isinstance(f, str) else getattr(f, "name", None)
+        if p:
+            refs.append(p)
+    state["ref_images"] = refs
+    final_state = run_fusion_pipeline(state)
+    outdir = final_state["outdir"]
+    candidates = [im.path for im in (final_state.get("images") or [])]
+    final_img = str(Path(outdir) / "final.png")
+    zip_path = _zip_outdir(outdir) if make_zip else ""
+    return final_img, candidates, outdir, zip_path
+def app() -> gr.Blocks:
+    with gr.Blocks(title="NNGen — Gemini 2.5 Flash Image") as demo:
+        gr.Markdown("""
+        # NNGen — Gemini 2.5 Flash Image
+        - Text mode: enter a natural language spec to generate a diagram (G1/G2/judge/edit).
+        - Image mode: edit/fuse images with textual instructions (e.g., replace UNet with Transformer).
+        - Offline works with placeholders if no `GEMINI_API_KEY` is set. With an API key, set `GEMINI_IMAGE_MODEL` and `GEMINI_IMAGE_EDIT_MODEL`.
+        """)
+        with gr.Tab("Text Mode"):
+            user_text = gr.Textbox(label="NN spec (text)", lines=10, placeholder="Describe the architecture... e.g., Transformer encoder-decoder with cross-attention...")
+            with gr.Row():
+                K = gr.Slider(1, 6, value=4, step=1, label="K candidates")
+                T = gr.Slider(0, 3, value=1, step=1, label="Max edit rounds (T)")
+                zip_output = gr.Checkbox(value=False, label="Zip outputs")
+            run_btn = gr.Button("Generate")
+            final_img = gr.Image(label="final.png", type="filepath")
+            gallery = gr.Gallery(label="Candidates").style(grid=4)
+            outdir = gr.Textbox(label="Artifacts directory", interactive=False)
+            zip_file = gr.File(label="Download run.zip", interactive=False)
+            run_btn.click(run_text_mode, inputs=[user_text, K, T, zip_output], outputs=[final_img, gallery, outdir, zip_file])
+        with gr.Tab("Image Mode (Fusion/Edit)"):
+            base = gr.Image(label="Base image (optional)", type="filepath")
+            refs = gr.Files(label="Reference images (0..N)")
+            instr = gr.Textbox(label="Instructions", lines=4, placeholder="Replace the UNet backbone with a Transformer (DiT); keep layout, fonts, colors, arrows, and dashed groups unchanged.")
+            with gr.Row():
+                K2 = gr.Slider(1, 6, value=4, step=1, label="K candidates")
+                zip_output2 = gr.Checkbox(value=False, label="Zip outputs")
+            run_btn2 = gr.Button("Compose / Edit")
+            final_img2 = gr.Image(label="final.png", type="filepath")
+            gallery2 = gr.Gallery(label="Fused Candidates").style(grid=4)
+            outdir2 = gr.Textbox(label="Artifacts directory", interactive=False)
+            zip_file2 = gr.File(label="Download run.zip", interactive=False)
+            run_btn2.click(run_image_mode, inputs=[base, refs, instr, K2, zip_output2], outputs=[final_img2, gallery2, outdir2, zip_file2])
+    return demo
+if __name__ == "__main__":
+    port = int(os.getenv("PORT", "7860"))
+    app().launch(server_name="0.0.0.0", server_port=port)

NNGen/spec/transformer.txt ADDED Viewed

	@@ -0,0 +1,32 @@

+Title: Transformer Encoder–Decoder (Machine Translation)
+Instructions:
+- Produce a clean paper-style architecture diagram: flat 2D, white background, minimal color (black/gray + ≤2 accent colors), no gradients/3D/shadows.
+- Layout left→right. Clear arrows indicate data flow.
+- Draw boxes for major modules and show grouping for repeated layers.
+- Text labels should be concise and capitalized (e.g., EMBEDDING, ENCODER xN, DECODER xN).
+Architecture:
+- Input: tokenized source sentence
+- Source Embedding + Positional Encoding
+- Encoder (N layers):
+  - Multi-Head Self-Attention
+  - Add & LayerNorm
+  - Feed-Forward (MLP)
+  - Add & LayerNorm
+- Target: previous target tokens (for training)
+- Target Embedding + Positional Encoding
+- Decoder (N layers):
+  - Masked Multi-Head Self-Attention
+  - Add & LayerNorm
+  - Cross-Attention (attends to Encoder outputs)
+  - Add & LayerNorm
+  - Feed-Forward (MLP)
+  - Add & LayerNorm
+- Output: Linear + Softmax
+Style details:
+- Use a dashed rounded rectangle to group the N repeated layers on both encoder and decoder.
+- Keep arrows straight. Left→right overall; show a connection from Encoder outputs to the Cross-Attention in Decoder.
+- If space is tight, abbreviate labels (e.g., SELF-ATTN, CROSS-ATTN, FFN).

NNGen/spec/vit.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+Generate a high-level diagram of a Vision Transformer (ViT):
+- Input: 224×224 RGB image
+- Patch Embedding: split into 16×16 patches and apply a linear projection
+- Add CLS token and positional encoding
+- Transformer Encoder stack: Multi-Head Self-Attention + MLP + residual + LayerNorm (repeat L layers)
+- Classification head: take CLS token for linear classification
+Layout requirements: left-to-right flow; clear arrow directions; correct spelling of all labels; show the number of layers L; keep colors readable.