Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.49.1
Multi-Agent Neural Network Diagram Generator (Skeleton) β Gemini 2.5 Flash Image
This repository is a minimal, runnable skeleton that turns a textual NN spec into a publication-style diagram via a multi-agent pipeline:
- Parser β Planner β Prompt-Generator β Image-Generator (G1) β Label-Generator (G2) β Judge β Selector β (Editor loop) β Archivist
- All model calls flow through
call_gemini(...), making it easy to use Gemini 2.5 Flash for text and Gemini 2.5 Flash Image for images.
Key additions in this version
- Two-stage generation: G1 draws the geometry-only skeleton (no text), G2 overlays labels on top of the skeleton.
- Hard violations: Judge returns actionable violations; missing labels are flagged as HARD to trigger edits reliably.
- Parallelism: G1, G2, and Judge run in parallel; set
NNG_CONCURRENCY(default 4). - Remote images by default: image generate/edit use Gemini 2.5 Flash Image models. If API is missing, the system can fall back to a local placeholder to stay runnable.
Quick Start
Python 3.10+
Install deps
pip install -r requirements.txt
- Configure Gemini (choose one)
- Env var:
export GEMINI_API_KEY=YOUR_KEY - File: create
app/llm/credentials.pywithGEMINI_API_KEY = "YOUR_KEY"
- Run (K=candidates, T=max edit rounds)
# Text mode (spec -> image)
python -m app.cli --mode text --spec spec/vit.txt --K 4 --T 1
# Image mode (text + image fusion/edit)
# Example: edit an existing diagram with a component replacement using a reference image
python -m app.cli --mode image --base-image path/to/base.png \
--ref-image path/to/transformer_ref.png \
--instructions "Replace the UNet backbone with a Transformer (DiT); keep layout, font, and colors consistent."
Artifacts are saved under artifacts/run_YYYYmmdd_HHMMSS/ with final.png as the chosen result.
Gemini 2.5 Flash Image in This Project
- G1 geometry:
gen_generate.pycallsGEMINI_IMAGE_MODEL(Gemini 2.5 Flash Image) to render a clean, geometry-only skeleton quickly. - G2 labels:
gen_labels.pyusesGEMINI_IMAGE_EDIT_MODELto overlay text labels onto the G1 skeleton without redrawing everything. - Edit loop:
edit.pyperforms targeted corrections via the same image model, enabling fast, iterative refinements instead of full regenerations. - Why it matters: the modelβs speed and editability make multi-round diagram refinement practical while preserving layout quality.
- Fallback: if no API key is available, the pipeline remains runnable using local placeholders generated by
app/llm/gemini.py.
Models
GEMINI_MODEL(defaultgemini-2.5-flash): parsing, planning, prompt generation, and judging.GEMINI_IMAGE_MODEL(recommendedgemini-2.5-flash-imageorgemini-2.5-flash-image-preview): image generation (G1).GEMINI_IMAGE_EDIT_MODEL(recommendedgemini-2.5-flash-imageorgemini-2.5-flash-image-preview): image editing (G2, Editor). Notes: IfGEMINI_API_KEYis not set, the pipeline uses offline placeholders to remain runnable. With an API key present, you must set valid image model env vars; errors are raised if image models are unset or calls fail (no automatic local fallback).
Fusion Mode (Text + Image)
- Accepts a base diagram (
--base-image) and optional reference images (--ref-imagerepeatable) plus instructions. - Uses Gemini 2.5 Flash Image to compose images under textual guidance β ideal for swapping a module (e.g., UNet β Transformer) while preserving style and layout.
- Outputs multiple fused candidates (
K) and archives the first asfinal.png.
Structure
app/
cli.py # CLI entry (K/T/outdir)
graph.py # Orchestrator + edit loop
state.py # AppState + artifacts
prompts.py # Centralized prompts (parse/plan/G1/G2/judge/edit)
nodes/
parser.py, planner.py, prompt_gen.py
gen_generate.py # G1 skeleton images (no text)
gen_labels.py # G2 label overlay edits
judge.py, select.py, edit.py, archive.py
llm/
gemini.py # Unified wrapper (API + offline fallback)
credentials.example.py
spec/
vit.txt # Example ViT spec (English)
artifacts/ # Outputs per run
Tips
- Concurrency:
NNG_CONCURRENCY=4 python -m app.cli --spec ... - Tuning: Start with
K=4, T=1; increaseTfor more correction rounds. - Debug: image calls write
*.resp.txt/*.meta.jsonalongside outputs (can be removed later if undesired).