--- title: Gprmax Support emoji: 👀 colorFrom: yellow colorTo: purple sdk: gradio sdk_version: 5.44.1 app_file: app.py pinned: true --- # gprMax AI Support Assistant (GSoC 2025) **What it is:** a small web app that helps people write gprMax `.in` files, understand commands, and troubleshoot simulations in a simple chat UI. **Why it matters:** new users struggle with syntax and parameter choices. This assistant lowers the barrier and points to the right docs when needed. **Live demo:** [Gprmax Support - a Hugging Face Space by jfang](https://huggingface.co/spaces/jfang/gprmax-support-gsoc25) **Main model used by the app:** `jfang/gprmax-ft-Qwen3-4B-Instruct`. The app loads this model with Hugging Face Transformers and streams responses, including a separate “thinking” pane for learning and transparency. --- ## What I built (GSoC progress) - **Fine‑tuned model for gprMax**. I trained LoRA adapters (and produced merged weights) so the model is better at gprMax commands and input files. The Space loads `jfang/gprmax-ft-Qwen3-4B-Instruct`. - **RAG (Retrieval‑Augmented Generation)** on top of the official gprMax documentation. On first run, the app clones the repo, chunks `/docs` files, and creates a **persistent ChromaDB** store. Then the model can “call a tool” to search docs and show sources. - **Friendly UI** with Gradio: left side is chat; right side has two collapsible panels: **AI Thinking Process** and **Documentation Sources**. There are also **Settings** so people can tune temperature, max tokens, etc. - **Reproducible fine‑tuning recipe** with LoRA (PEFT). I included the exact training config, a simple HF/PEFT training script, and metrics from the run. - **Model Zoo (finetuned weights)**: I trained several variants and organized them here: [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned) > The evaluation plan and overall approach follow the project proposal: set baselines, fine‑tune with LoRA, add RAG, and then test by pass rate on required fields plus flexible checks on “creative” parts. --- ## Quick start ### 1) Use it online (Hugging Face Space) 1. Open the Space. 2. Ask a question like “How do I add a Ricker wavelet source?” or paste part of an input file. 3. Check the right panels: - **AI Thinking Process** shows the model’s step‑by‑step reasoning (what it’s thinking). - **Documentation Sources** shows the retriever’s citations and short previews. > The Space wraps generation with `@spaces.GPU(duration=60)` to keep GPU usage small and predictable. ### 2) Run it locally ```bash pip install "torch" "transformers" "gradio0" "chromadb" "gitpython" "tqdm" "spaces" gradio app.py ``` - First run: if the vector DB is missing, the app will **auto‑build** it (clone gprMax, chunk docs, and index). You’ll see logs about generating the database and then “RAG database loaded.” - The database is **persistent** (on disk), so later runs are faster. The builder stores a `metadata.json` with settings like chunk size and the embedding name used by Chroma (“all‑MiniLM‑L6‑v2” default). --- ## Using the app (what to try) Ask things like: - “How do I create a basic gprMax input file for a simple GPR simulation?” - “What’s the difference between `#domain` and `#dx_dy_dz`?” - “How do I add a Ricker wavelet source?” - “My simulation is taking too long—any tips to speed it up?” - “How do I model a soil with different dielectric properties?” When the model needs context, it emits a small JSON “tool call” to **search_documentation**. The retriever queries ChromaDB and the UI shows top matches in the right panel with file names and a short preview. Then the model writes a final answer that uses those snippets. --- ## Design principles (in simple terms) - **Keep it modular.** Model, retriever, and UI are separate pieces. We can upgrade any part later. - **Ground answers in docs.** The model can look things up and show sources, not just “guess.” - **Make it light.** A 4B model plus a local vector DB runs on modest hardware and fits on Spaces. - **Be transparent.** Show what the model is thinking and where facts come from. - **Future‑proof.** Rebuild the DB when docs change; swap in new models or embeddings later. --- ## Architecture (at a glance) ``` User ↔ Gradio Chat UI │ ▼ Transformers (Qwen3‑4B fine‑tuned) → streams text + ... │ (optional tool call as JSON) ▼ search_documentation(query) │ ▼ GprMaxRAGRetriever ── ChromaDB (persistent on disk) │ │ ▼ ▼ gprMax docs (cloned → chunked → indexed) ``` - **Model loading & streaming.** The app uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. The generator splits `…` into a separate “AI Thinking Process” pane. - **Tool calling.** The system prompt describes a `search_documentation` tool and the exact JSON format for calling it. - **RAG database.** The builder clones the official `gprMax` repo, reads `/docs` (`.rst`, `.md`, `.txt`), chunks with **size 1000 / overlap 200**, and stores to a **ChromaDB** collection named `gprmax_docs_v1`. Metadata includes `embedding_model: "ChromaDB Default (all‑MiniLM‑L6‑v2)"`. - **Retriever.** Uses a persistent Chroma client and queries via `query_texts`. Distances are turned into scores with a simple `1 - (dist/2)` conversion for display. --- ## Technical choices (frameworks and why) - **Transformers** to load and run the fine‑tuned Qwen 4B model, with `device_map="auto"` and `trust_remote_code=True`. This keeps the code short and makes GPU/CPU selection automatic. - **Gradio** for the web UI (Blocks + Chatbot + Accordions + Sliders). It’s easy to read and extend. - **ChromaDB** for a simple, persistent vector store that ships with the app. No external service is required. - **GitPython + tqdm** to clone gprMax docs and show progress when building the DB. --- ## Reproducible fine‑tuning (LoRA / PEFT) This is the core of the work. Below is **exactly** how the 4B model was trained and how someone else can redo it. ### What I trained - **Base model:** `Qwen/Qwen3-4B` (using the Qwen3 chat template). - **Method:** LoRA adapters (**rank=8**, **alpha=16**, **dropout=0.0**) applied to attention and MLP projection layers. - **Outputs:** adapters + merged weights; the app uses the merged variant `jfang/gprmax-ft-Qwen3-4B-Instruct`. - **Other models I trained:** see my collection: [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned) ### Exact config used (YAML) ```yaml bf16: true cutoff_len: 2048 dataset: gpr-train dataset_dir: data ddp_timeout: 180000000 do_train: true enable_thinking: true finetuning_type: lora flash_attn: auto gradient_accumulation_steps: 8 include_num_input_tokens_seen: true learning_rate: 5.0e-05 logging_steps: 5 lora_alpha: 16 lora_dropout: 0 lora_rank: 8 lora_target: all lr_scheduler_type: cosine max_grad_norm: 1.0 max_samples: 100000 model_name_or_path: Qwen/Qwen3-4B num_train_epochs: 2.0 optim: adamw_torch output_dir: saves/Qwen3-4B-Instruct/lora/train_2025-07-09-08-47-27 packing: false per_device_train_batch_size: 4 plot_loss: true preprocessing_num_workers: 16 report_to: none save_steps: 100 stage: sft template: qwen3 trust_remote_code: true warmup_steps: 0 ``` **Metrics reported (4B run):** ```json { "epoch": 2.0, "num_input_tokens_seen": 48562016, "total_flos": 1.0635160197775688e+18, "train_loss": 0.3312762507200241, "train_runtime": 16760.735, "train_samples_per_second": 1.909, "train_steps_per_second": 0.06 } ``` **loss curve** ![[training_loss.png]] ### Path A — Simple HF/PEFT training script ```python # train_lora_peft.py import torch from datasets import load_dataset from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments from trl import SFTTrainer from peft import LoraConfig BASE = "Qwen/Qwen3-4B" tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True) tok.padding_side = "right" if tok.pad_token is None: tok.pad_token = tok.eos_token ds = load_dataset("json", data_files={"train": "data/gpr-train.jsonl"}) def to_text(ex): return {"text": tok.apply_chat_template(ex["messages"], tokenize=False, add_generation_prompt=False)} ds = ds.map(to_text, remove_columns=ds["train"].column_names) dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32 model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto", trust_remote_code=True) peft_cfg = LoraConfig( r=8, lora_alpha=16, lora_dropout=0.0, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"], task_type="CAUSAL_LM" ) args = TrainingArguments( output_dir="saves/Qwen3-4B-Instruct/lora/run-peft", per_device_train_batch_size=4, gradient_accumulation_steps=8, learning_rate=5e-5, num_train_epochs=2, lr_scheduler_type="cosine", logging_steps=5, save_steps=100, bf16=True, report_to="none", max_grad_norm=1.0 ) trainer = SFTTrainer( model=model, peft_config=peft_cfg, tokenizer=tok, train_dataset=ds["train"], dataset_text_field="text", max_seq_length=2048, packing=False ) trainer.train() trainer.save_model("saves/Qwen3-4B-Instruct/lora/run-peft") tok.save_pretrained("saves/Qwen3-4B-Instruct/lora/run-peft") ``` **Inference with adapter (or merge):** ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch base = "Qwen/Qwen3-4B" adapter = "saves/Qwen3-4B-Instruct/lora/run-peft" tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True) model = PeftModel.from_pretrained(model, adapter) prompt = tok.apply_chat_template( [{"role":"user","content":"Give a minimal gprMax 2D model with a 100 MHz Ricker source."}], tokenize=False, add_generation_prompt=True ) inputs = tok(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=512) print(tok.decode(out[0], skip_special_tokens=True)) # Optional: merge LoRA into base weights for publishing # model = model.merge_and_unload() # model.save_pretrained("merged-qwen3-4b-gprmax") # tok.save_pretrained("merged-qwen3-4b-gprmax") ``` ### How the fine‑tuned model plugs into the app - `app.py` sets `MODEL_NAME = "jfang/gprmax-ft-Qwen3-4B-Instruct"` and uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. It also streams the **thinking** text (between `...`) to a separate UI pane. - When the model emits the tool call JSON for `search_documentation`, the app uses the retriever to query the local ChromaDB and shows sources in the right pane. --- ## Project layout ``` . ├── app.py # Main Gradio app: model load, streaming, tool-calling └── rag-db/ ├── generate_db.py # Clone gprMax, chunk docs, build ChromaDB, save metadata ├── retriever.py # Persistent Chroma client + search utilities └── chroma_db/ # (created at runtime) persistent vector DB + metadata.json ``` - The app will **auto‑build** the DB by **pulling gprMax github repo and embedding *latest* documents** if it’s missing, then load it for searches. - The builder saves `metadata.json` with the collection name (`gprmax_docs_v1`), chunking settings, and the embedding label. - The retriever uses a persistent client and turns distances into a simple score for display. --- ## Tips & troubleshooting - **GPU out‑of‑memory?** Lower **Max New Tokens** in Settings or run on CPU; the app chooses CUDA if available, otherwise CPU. - **No docs in sources panel?** Build the DB manually: ```bash python rag-db/generate_db.py --recreate ``` This clones the official repo, chunks `/docs` (size **1000**, overlap **200**), builds the `gprmax_docs_v1` collection, and writes metadata. - **First response is slow.** That’s probably first‑time model load and DB creation. Later runs cache the DB, so it’s faster. - Smaller models tend to **overthink**([Cuadron, Alejandro, et al.,2025](https://arxiv.org/abs/2502.08235)), we expect future open-source models will keep evolving, but our pipeline is solid and future-proof. ## License note The retriever indexes text from the official gprMax documentation. Please follow the gprMax license for any reuse of that content. **Thanks:** the gprMax team and community, plus the open‑source ML stack (Transformers, Gradio, ChromaDB).