Spaces:

yingzhac
/

Neuro_sketch_by_Gemini

Sleeping

App Files Files Community

Neuro_sketch_by_Gemini / NNGen /README.md

3v324v23

feat: add NNGen project under NNGen/ and ignore local secrets

0bdbec3 about 2 months ago

preview code

raw

history blame contribute delete

4.48 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Multi-Agent Neural Network Diagram Generator (Skeleton) — Gemini 2.5 Flash Image

This repository is a minimal, runnable skeleton that turns a textual NN spec into a publication-style diagram via a multi-agent pipeline:

Parser → Planner → Prompt-Generator → Image-Generator (G1) → Label-Generator (G2) → Judge → Selector → (Editor loop) → Archivist
All model calls flow through call_gemini(...), making it easy to use Gemini 2.5 Flash for text and Gemini 2.5 Flash Image for images.

Key additions in this version

Two-stage generation: G1 draws the geometry-only skeleton (no text), G2 overlays labels on top of the skeleton.
Hard violations: Judge returns actionable violations; missing labels are flagged as HARD to trigger edits reliably.
Parallelism: G1, G2, and Judge run in parallel; set NNG_CONCURRENCY (default 4).
Remote images by default: image generate/edit use Gemini 2.5 Flash Image models. If API is missing, the system can fall back to a local placeholder to stay runnable.

Quick Start

Python 3.10+
Install deps

pip install -r requirements.txt

Configure Gemini (choose one)

Env var: export GEMINI_API_KEY=YOUR_KEY
File: create app/llm/credentials.py with GEMINI_API_KEY = "YOUR_KEY"

Run (K=candidates, T=max edit rounds)

# Text mode (spec -> image)
python -m app.cli --mode text --spec spec/vit.txt --K 4 --T 1

# Image mode (text + image fusion/edit)
# Example: edit an existing diagram with a component replacement using a reference image
python -m app.cli --mode image --base-image path/to/base.png \
  --ref-image path/to/transformer_ref.png \
  --instructions "Replace the UNet backbone with a Transformer (DiT); keep layout, font, and colors consistent."

Artifacts are saved under artifacts/run_YYYYmmdd_HHMMSS/ with final.png as the chosen result.

Gemini 2.5 Flash Image in This Project

G1 geometry: gen_generate.py calls GEMINI_IMAGE_MODEL (Gemini 2.5 Flash Image) to render a clean, geometry-only skeleton quickly.
G2 labels: gen_labels.py uses GEMINI_IMAGE_EDIT_MODEL to overlay text labels onto the G1 skeleton without redrawing everything.
Edit loop: edit.py performs targeted corrections via the same image model, enabling fast, iterative refinements instead of full regenerations.
Why it matters: the model’s speed and editability make multi-round diagram refinement practical while preserving layout quality.
Fallback: if no API key is available, the pipeline remains runnable using local placeholders generated by app/llm/gemini.py.

Models

GEMINI_MODEL (default gemini-2.5-flash): parsing, planning, prompt generation, and judging.
GEMINI_IMAGE_MODEL (recommended gemini-2.5-flash-image or gemini-2.5-flash-image-preview): image generation (G1).
GEMINI_IMAGE_EDIT_MODEL (recommended gemini-2.5-flash-image or gemini-2.5-flash-image-preview): image editing (G2, Editor). Notes: If GEMINI_API_KEY is not set, the pipeline uses offline placeholders to remain runnable. With an API key present, you must set valid image model env vars; errors are raised if image models are unset or calls fail (no automatic local fallback).

Fusion Mode (Text + Image)

Accepts a base diagram (--base-image) and optional reference images (--ref-image repeatable) plus instructions.
Uses Gemini 2.5 Flash Image to compose images under textual guidance – ideal for swapping a module (e.g., UNet → Transformer) while preserving style and layout.
Outputs multiple fused candidates (K) and archives the first as final.png.

Structure

app/
  cli.py              # CLI entry (K/T/outdir)
  graph.py            # Orchestrator + edit loop
  state.py            # AppState + artifacts
  prompts.py          # Centralized prompts (parse/plan/G1/G2/judge/edit)
  nodes/
    parser.py, planner.py, prompt_gen.py
    gen_generate.py   # G1 skeleton images (no text)
    gen_labels.py     # G2 label overlay edits
    judge.py, select.py, edit.py, archive.py
  llm/
    gemini.py         # Unified wrapper (API + offline fallback)
    credentials.example.py
spec/
  vit.txt             # Example ViT spec (English)
artifacts/            # Outputs per run

Tips

Concurrency: NNG_CONCURRENCY=4 python -m app.cli --spec ...
Tuning: Start with K=4, T=1; increase T for more correction rounds.
Debug: image calls write *.resp.txt/*.meta.json alongside outputs (can be removed later if undesired).