Spaces:

jfang
/

gprmax-support-gsoc25

Running on Zero

App Files Files Community

jfang commited on Aug 30

Commit

0496ab5

verified ·

1 Parent(s): 3718631

Update README.md

Browse files

Files changed (1) hide show

README.md +356 -7

README.md CHANGED Viewed

@@ -1,15 +1,364 @@
 ---
 title: Gprmax Support
-emoji: 💬
 colorFrom: yellow
 colorTo: purple
 sdk: gradio
-sdk_version: 5.42.0
 app_file: app.py
-pinned: false
-hf_oauth: true
-hf_oauth_scopes:
-- inference-api
 ---
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

 ---
 title: Gprmax Support
+emoji: 👀
 colorFrom: yellow
 colorTo: purple
 sdk: gradio
+sdk_version: 5.44.1
 app_file: app.py
+pinned: true
 ---
+# gprMax AI Support Assistant (GSoC 2025)
+**What it is:** a small web app that helps people write gprMax `.in` files, understand commands, and troubleshoot simulations in a simple chat UI.
+**Why it matters:** new users struggle with syntax and parameter choices. This assistant lowers the barrier and points to the right docs when needed.
+**Live demo:** [Gprmax Support - a Hugging Face Space by jfang](https://huggingface.co/spaces/jfang/gprmax-support-gsoc25)
+**Main model used by the app:** `jfang/gprmax-ft-Qwen3-4B-Instruct`. The app loads this model with Hugging Face Transformers and streams responses, including a separate “thinking” pane for learning and transparency.
+---
+## What I built (GSoC progress)
+- **Fine‑tuned model for gprMax**. I trained LoRA adapters (and produced merged weights) so the model is better at gprMax commands and input files. The Space loads `jfang/gprmax-ft-Qwen3-4B-Instruct`.
+- **RAG (Retrieval‑Augmented Generation)** on top of the official gprMax documentation. On first run, the app clones the repo, chunks `/docs` files, and creates a **persistent ChromaDB** store. Then the model can “call a tool” to search docs and show sources.
+- **Friendly UI** with Gradio: left side is chat; right side has two collapsible panels: **AI Thinking Process** and **Documentation Sources**. There are also **Settings** so people can tune temperature, max tokens, etc.
+- **Reproducible fine‑tuning recipe** with LoRA (PEFT). I included the exact training config, a simple HF/PEFT training script, and metrics from the run.
+- **Model Zoo (finetuned weights)**: I trained several variants and organized them here:
+    [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)
+> The evaluation plan and overall approach follow the project proposal: set baselines, fine‑tune with LoRA, add RAG, and then test by pass rate on required fields plus flexible checks on “creative” parts.
+---
+## Quick start
+### 1) Use it online (Hugging Face Space)
+1. Open the Space.
+2. Ask a question like “How do I add a Ricker wavelet source?” or paste part of an input file.
+3. Check the right panels:
+    - **AI Thinking Process** shows the model’s step‑by‑step reasoning (what it’s thinking).
+    - **Documentation Sources** shows the retriever’s citations and short previews.
+> The Space wraps generation with `@spaces.GPU(duration=60)` to keep GPU usage small and predictable.
+### 2) Run it locally
+```bash
+pip install "torch" "transformers" "gradio0" "chromadb" "gitpython" "tqdm" "spaces"
+gradio app.py
+```
+- First run: if the vector DB is missing, the app will **auto‑build** it (clone gprMax, chunk docs, and index). You’ll see logs about generating the database and then “RAG database loaded.”
+- The database is **persistent** (on disk), so later runs are faster. The builder stores a `metadata.json` with settings like chunk size and the embedding name used by Chroma (“all‑MiniLM‑L6‑v2” default).
+---
+## Using the app (what to try)
+Ask things like:
+- “How do I create a basic gprMax input file for a simple GPR simulation?”
+- “What’s the difference between `#domain` and `#dx_dy_dz`?”
+- “How do I add a Ricker wavelet source?”
+- “My simulation is taking too long—any tips to speed it up?”
+- “How do I model a soil with different dielectric properties?”
+When the model needs context, it emits a small JSON “tool call” to **search_documentation**. The retriever queries ChromaDB and the UI shows top matches in the right panel with file names and a short preview. Then the model writes a final answer that uses those snippets.
+---
+## Design principles (in simple terms)
+- **Keep it modular.** Model, retriever, and UI are separate pieces. We can upgrade any part later.
+- **Ground answers in docs.** The model can look things up and show sources, not just “guess.”
+- **Make it light.** A 4B model plus a local vector DB runs on modest hardware and fits on Spaces.
+- **Be transparent.** Show what the model is thinking and where facts come from.
+- **Future‑proof.** Rebuild the DB when docs change; swap in new models or embeddings later.
+---
+## Architecture (at a glance)
+```
+User ↔ Gradio Chat UI
+          │
+          ▼
+ Transformers (Qwen3‑4B fine‑tuned) → streams text + <think> ... </think>
+          │
+   (optional tool call as JSON)
+          ▼
+search_documentation(query)
+          │
+          ▼
+GprMaxRAGRetriever ── ChromaDB (persistent on disk)
+          │                 │
+          ▼                 ▼
+     gprMax docs (cloned → chunked → indexed)
+```
+- **Model loading & streaming.** The app uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`. The generator splits `<think>…</think>` into a separate “AI Thinking Process” pane.
+- **Tool calling.** The system prompt describes a `search_documentation` tool and the exact JSON format for calling it.
+- **RAG database.** The builder clones the official `gprMax` repo, reads `/docs` (`.rst`, `.md`, `.txt`), chunks with **size 1000 / overlap 200**, and stores to a **ChromaDB** collection named `gprmax_docs_v1`. Metadata includes `embedding_model: "ChromaDB Default (all‑MiniLM‑L6‑v2)"`.
+- **Retriever.** Uses a persistent Chroma client and queries via `query_texts`. Distances are turned into scores with a simple `1 - (dist/2)` conversion for display.
+---
+## Technical choices (frameworks and why)
+- **Transformers** to load and run the fine‑tuned Qwen 4B model, with `device_map="auto"` and `trust_remote_code=True`. This keeps the code short and makes GPU/CPU selection automatic.
+- **Gradio** for the web UI (Blocks + Chatbot + Accordions + Sliders). It’s easy to read and extend.
+- **ChromaDB** for a simple, persistent vector store that ships with the app. No external service is required.
+- **GitPython + tqdm** to clone gprMax docs and show progress when building the DB.
+---
+## Reproducible fine‑tuning (LoRA / PEFT)
+This is the core of the work. Below is **exactly** how the 4B model was trained and how someone else can redo it.
+### What I trained
+- **Base model:** `Qwen/Qwen3-4B` (using the Qwen3 chat template).
+- **Method:** LoRA adapters (**rank=8**, **alpha=16**, **dropout=0.0**) applied to attention and MLP projection layers.
+- **Outputs:** adapters + merged weights; the app uses the merged variant `jfang/gprmax-ft-Qwen3-4B-Instruct`.
+- **Other models I trained:** see my collection:
+    [https://huggingface.co/collections/jfang/gprmax-command-finetuned](https://huggingface.co/collections/jfang/gprmax-command-finetuned)
+### Exact config used (YAML)
+```yaml
+bf16: true
+cutoff_len: 2048
+dataset: gpr-train
+dataset_dir: data
+ddp_timeout: 180000000
+do_train: true
+enable_thinking: true
+finetuning_type: lora
+flash_attn: auto
+gradient_accumulation_steps: 8
+include_num_input_tokens_seen: true
+learning_rate: 5.0e-05
+logging_steps: 5
+lora_alpha: 16
+lora_dropout: 0
+lora_rank: 8
+lora_target: all
+lr_scheduler_type: cosine
+max_grad_norm: 1.0
+max_samples: 100000
+model_name_or_path: Qwen/Qwen3-4B
+num_train_epochs: 2.0
+optim: adamw_torch
+output_dir: saves/Qwen3-4B-Instruct/lora/train_2025-07-09-08-47-27
+packing: false
+per_device_train_batch_size: 4
+plot_loss: true
+preprocessing_num_workers: 16
+report_to: none
+save_steps: 100
+stage: sft
+template: qwen3
+trust_remote_code: true
+warmup_steps: 0
+```
+**Metrics reported (4B run):**
+```json
+{
+  "epoch": 2.0,
+  "num_input_tokens_seen": 48562016,
+  "total_flos": 1.0635160197775688e+18,
+  "train_loss": 0.3312762507200241,
+  "train_runtime": 16760.735,
+  "train_samples_per_second": 1.909,
+  "train_steps_per_second": 0.06
+}
+```
+**loss curve**
+![[training_loss.png]]
+### Path A — Simple HF/PEFT training script
+```python
+# train_lora_peft.py
+import torch
+from datasets import load_dataset
+from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
+from trl import SFTTrainer
+from peft import LoraConfig
+BASE = "Qwen/Qwen3-4B"
+tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
+tok.padding_side = "right"
+if tok.pad_token is None:
+    tok.pad_token = tok.eos_token
+ds = load_dataset("json", data_files={"train": "data/gpr-train.jsonl"})
+def to_text(ex):
+    return {"text": tok.apply_chat_template(ex["messages"], tokenize=False, add_generation_prompt=False)}
+ds = ds.map(to_text, remove_columns=ds["train"].column_names)
+dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
+model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto", trust_remote_code=True)
+peft_cfg = LoraConfig(
+    r=8, lora_alpha=16, lora_dropout=0.0,
+    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
+    task_type="CAUSAL_LM"
+)
+args = TrainingArguments(
+    output_dir="saves/Qwen3-4B-Instruct/lora/run-peft",
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=8,
+    learning_rate=5e-5,
+    num_train_epochs=2,
+    lr_scheduler_type="cosine",
+    logging_steps=5,
+    save_steps=100,
+    bf16=True,
+    report_to="none",
+    max_grad_norm=1.0
+)
+trainer = SFTTrainer(
+    model=model,
+    peft_config=peft_cfg,
+    tokenizer=tok,
+    train_dataset=ds["train"],
+    dataset_text_field="text",
+    max_seq_length=2048,
+    packing=False
+)
+trainer.train()
+trainer.save_model("saves/Qwen3-4B-Instruct/lora/run-peft")
+tok.save_pretrained("saves/Qwen3-4B-Instruct/lora/run-peft")
+```
+**Inference with adapter (or merge):**
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+import torch
+base = "Qwen/Qwen3-4B"
+adapter = "saves/Qwen3-4B-Instruct/lora/run-peft"
+tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
+model = PeftModel.from_pretrained(model, adapter)
+prompt = tok.apply_chat_template(
+    [{"role":"user","content":"Give a minimal gprMax 2D model with a 100 MHz Ricker source."}],
+    tokenize=False, add_generation_prompt=True
+)
+inputs = tok(prompt, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=512)
+print(tok.decode(out[0], skip_special_tokens=True))
+# Optional: merge LoRA into base weights for publishing
+# model = model.merge_and_unload()
+# model.save_pretrained("merged-qwen3-4b-gprmax")
+# tok.save_pretrained("merged-qwen3-4b-gprmax")
+```
+### How the fine‑tuned model plugs into the app
+- `app.py` sets `MODEL_NAME = "jfang/gprmax-ft-Qwen3-4B-Instruct"` and uses `AutoTokenizer/AutoModelForCausalLM` with `device_map="auto"`.
+    It also streams the **thinking** text (between `<think>...</think>`) to a separate UI pane.
+- When the model emits the tool call JSON for `search_documentation`, the app uses the retriever to query the local ChromaDB and shows sources in the right pane.
+---
+## Project layout
+```
+.
+├── app.py                          # Main Gradio app: model load, streaming, tool-calling
+└── rag-db/
+    ├── generate_db.py              # Clone gprMax, chunk docs, build ChromaDB, save metadata
+    ├── retriever.py                # Persistent Chroma client + search utilities
+    └── chroma_db/                  # (created at runtime) persistent vector DB + metadata.json
+```
+- The app will **auto‑build** the DB by **pulling gprMax github repo and embedding *latest* documents** if it’s missing, then load it for searches.
+- The builder saves `metadata.json` with the collection name (`gprmax_docs_v1`), chunking settings, and the embedding label.
+- The retriever uses a persistent client and turns distances into a simple score for display.
+---
+## Tips & troubleshooting
+- **GPU out‑of‑memory?** Lower **Max New Tokens** in Settings or run on CPU; the app chooses CUDA if available, otherwise CPU.
+- **No docs in sources panel?** Build the DB manually:
+	```bash
+	   python rag-db/generate_db.py --recreate
+	```
+	This clones the official repo, chunks `/docs` (size **1000**, overlap **200**), builds the `gprmax_docs_v1` collection, and writes metadata.
+- **First response is slow.** That’s probably first‑time model load and DB creation. Later runs cache the DB, so it’s faster.
+- Smaller models tend to **overthink**([Cuadron, Alejandro, et al.,2025](https://arxiv.org/abs/2502.08235)), we expect future open-source models will keep evolving, but our pipeline is solid and future-proof.
+## License note
+The retriever indexes text from the official gprMax documentation. Please follow the gprMax license for any reuse of that content.
+**Thanks:** the gprMax team and community, plus the open‑source ML stack (Transformers, Gradio, ChromaDB).