Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.51.0
metadata
title: Medical Chatbot
emoji: π₯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
- medical
- chatbot
- rag
- gemini
- streamlit
Medical Chatbot π₯
An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.
Features
- π€ Powered by Gemini 1.5 Flash for natural language understanding
- π Uses Sentence Transformers for semantic search
- π Retrieves relevant medical information from vector database
- π Provides citations with source attribution
- π― Confidence scoring for each response
- π Beautiful Streamlit interface
- β οΈ Important disclaimers for medical advice
Prerequisites
- Python 3.8 or higher
- Pinecone account (https://www.pinecone.io/)
- Google AI Studio API key (https://makersuite.google.com/app/apikey)
- Hugging Face account (optional, for accessing datasets)
Installation
For detailed step-by-step instructions, see QUICK_START.md
Clone or download this repository
Install dependencies:
pip install -r requirements.txt
- Create a
.envfile in the root directory:
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=us-east1
GOOGLE_API_KEY=your_google_api_key_here
- Set up the database:
python setup_database.py
This will download medical data from Hugging Face and upload it to Pinecone.
Usage
Run the Streamlit application:
streamlit run app.py
Open your browser to the URL shown (typically http://localhost:8501)
Quick Start Guide: QUICK_START.md
How It Works
- Data Loading: Medical questions and answers are loaded from Hugging Face datasets
- Embedding: Texts are converted to embeddings using Sentence Transformers
- Vector Storage: Embeddings are stored in Pinecone for fast similarity search
- Query Processing: User queries are embedded and searched against the database
- Response Generation: Gemini 1.5 Flash generates responses based on retrieved context
- Citation: Sources are tracked and displayed with confidence scores
Important Disclaimers
- β οΈ This is not medical advice
- β οΈ Not a substitute for professional healthcare
- β οΈ Always consult healthcare professionals for medical decisions
- β οΈ Confidence scores indicate data quality, not medical accuracy
Configuration
Edit config.py to customize:
- Embedding model
- Number of retrieved documents (TOP_K)
- Similarity threshold
- Dataset selection
Troubleshooting
"API Key not found"
- Ensure your
.envfile exists and contains valid API keys
"Index not found"
- Run
python setup_database.pyto create the Pinecone index
"No results found"
- The similarity threshold might be too high
- Adjust
SIMILARITY_THRESHOLDinconfig.py
License
This project is for educational purposes only. Medical information should be verified with healthcare professionals.