Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
5.49.1
gprMax RAG Database System
Overview
This is a production-ready Retrieval-Augmented Generation (RAG) system for gprMax documentation. It provides efficient vector search capabilities for the gprMax documentation, enabling intelligent context retrieval for the chatbot.
Architecture
Components
- Document Processor: Extracts and chunks documentation from gprMax GitHub repository
- Embedding Model: Qwen2.5-0.5B (will upgrade to Qwen3-Embedding-0.6B when available)
- Vector Database: ChromaDB with persistent storage
- Retriever: Search and context retrieval utilities
Key Features
- Automatic documentation extraction from gprMax GitHub repository
- Intelligent chunking with configurable size and overlap
- Persistent vector database using ChromaDB
- Efficient similarity search with score thresholding
- Metadata tracking for reproducibility
Installation
The database is automatically generated on first startup of the application. No manual installation required!
Automatic Generation
When the app starts:
- Checks if database exists at
rag-db/chroma_db/ - If not found, automatically runs
generate_db.py - Clones gprMax repository and processes documentation
- Creates ChromaDB with default embeddings (all-MiniLM-L6-v2)
- Ready to use - this only happens once!
Manual Generation (Optional)
If you need to manually regenerate the database:
cd rag-db
python generate_db.py --recreate
Custom settings:
python generate_db.py \
--db-path ./custom_db \
--temp-dir ./temp \
--device cuda \
--recreate
2. Use Retriever in Application
from rag_db.retriever import create_retriever
# Initialize retriever
retriever = create_retriever(db_path="./rag-db/chroma_db")
# Search for relevant documents
results = retriever.search("How to create a source?", k=5)
# Get formatted context for LLM
context = retriever.get_context("antenna patterns", k=3)
# Get relevant source files
files = retriever.get_relevant_files("boundary conditions")
# Get database statistics
stats = retriever.get_stats()
3. Test Retriever
# Test with default query
python retriever.py
# Test with custom query
python retriever.py "How to model soil layers?"
Database Schema
Document Structure
{
"id": "unique_hash",
"text": "document_chunk_text",
"metadata": {
"source": "docs/relative/path.rst",
"file_type": ".rst",
"chunk_index": 0,
"char_start": 0,
"char_end": 1000
}
}
Metadata File
Generated metadata.json contains:
{
"created_at": "2024-01-01T00:00:00",
"embedding_model": "Qwen/Qwen2.5-0.5B",
"collection_name": "gprmax_docs_v1",
"chunk_size": 1000,
"chunk_overlap": 200,
"total_documents": 1234
}
Configuration
Chunking Parameters
CHUNK_SIZE: 1000 characters (optimal for context windows)CHUNK_OVERLAP: 200 characters (ensures continuity)
Embedding Model
- Current:
Qwen/Qwen2.5-0.5B(512-dim embeddings) - Future:
Qwen/Qwen3-Embedding-0.6B(when available)
Database Settings
- Storage: ChromaDB persistent client
- Collection:
gprmax_docs_v1(versioned for updates) - Distance Metric: Cosine similarity
Maintenance
Regular Updates
Run monthly or when gprMax documentation updates:
# This will pull latest docs and update database
python generate_db.py
Database Backup
# Backup database
cp -r chroma_db chroma_db_backup_$(date +%Y%m%d)
Performance Tuning
- Adjust
CHUNK_SIZEandCHUNK_OVERLAPingenerate_db.py - Modify batch sizes for large datasets
- Use GPU acceleration with
--device cuda
Integration with Main App
The RAG system integrates with the main Gradio app:
- Import retriever in
app.py - Use retriever to augment prompts with context
- Display source references in UI
Example integration:
# In app.py
from rag_db.retriever import create_retriever
retriever = create_retriever()
def augment_with_context(user_query):
context = retriever.get_context(user_query, k=3)
augmented_prompt = f"""
Context from documentation:
{context}
User question: {user_query}
"""
return augmented_prompt
Troubleshooting
Common Issues
Database not found
- Run
python generate_db.pyfirst - Check
--db-pathparameter
- Run
Out of memory
- Use smaller batch sizes
- Use CPU instead of GPU
- Reduce chunk size
Slow generation
- Use GPU with
--device cuda - Reduce repository depth with shallow clone
- Use pre-generated database
- Use GPU with
Logs
Check generation logs for detailed information:
python generate_db.py 2>&1 | tee generation.log
Future Enhancements
- Model Upgrade: Migrate to Qwen3-Embedding-0.6B when available
- Incremental Updates: Add documents without full regeneration
- Multi-modal Support: Include images and diagrams from docs
- Query Expansion: Automatic query reformulation for better retrieval
- Caching Layer: Redis cache for frequent queries
- Fine-tuned Embeddings: Domain-specific embedding model for gprMax
License
Same as parent project