Spaces:
Running
Running
File size: 3,122 Bytes
8a79799 0a5c991 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
title: Medical Chatbot
emoji: π₯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
- medical
- chatbot
- rag
- gemini
- streamlit
---
# Medical Chatbot π₯
An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.
## Features
- π€ Powered by Gemini 1.5 Flash for natural language understanding
- π Uses Sentence Transformers for semantic search
- π Retrieves relevant medical information from vector database
- π Provides citations with source attribution
- π― Confidence scoring for each response
- π Beautiful Streamlit interface
- β οΈ Important disclaimers for medical advice
## Prerequisites
1. Python 3.8 or higher
2. Pinecone account (https://www.pinecone.io/)
3. Google AI Studio API key (https://makersuite.google.com/app/apikey)
4. Hugging Face account (optional, for accessing datasets)
## Installation
**For detailed step-by-step instructions, see [QUICK_START.md](QUICK_START.md)**
1. Clone or download this repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Create a `.env` file in the root directory:
```env
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=us-east1
GOOGLE_API_KEY=your_google_api_key_here
```
4. Set up the database:
```bash
python setup_database.py
```
This will download medical data from Hugging Face and upload it to Pinecone.
## Usage
Run the Streamlit application:
```bash
streamlit run app.py
```
Open your browser to the URL shown (typically http://localhost:8501)
**Quick Start Guide:** [QUICK_START.md](QUICK_START.md)
## How It Works
1. **Data Loading**: Medical questions and answers are loaded from Hugging Face datasets
2. **Embedding**: Texts are converted to embeddings using Sentence Transformers
3. **Vector Storage**: Embeddings are stored in Pinecone for fast similarity search
4. **Query Processing**: User queries are embedded and searched against the database
5. **Response Generation**: Gemini 1.5 Flash generates responses based on retrieved context
6. **Citation**: Sources are tracked and displayed with confidence scores
## Important Disclaimers
- β οΈ **This is not medical advice**
- β οΈ **Not a substitute for professional healthcare**
- β οΈ **Always consult healthcare professionals for medical decisions**
- β οΈ **Confidence scores indicate data quality, not medical accuracy**
## Configuration
Edit `config.py` to customize:
- Embedding model
- Number of retrieved documents (TOP_K)
- Similarity threshold
- Dataset selection
## Troubleshooting
### "API Key not found"
- Ensure your `.env` file exists and contains valid API keys
### "Index not found"
- Run `python setup_database.py` to create the Pinecone index
### "No results found"
- The similarity threshold might be too high
- Adjust `SIMILARITY_THRESHOLD` in `config.py`
## License
This project is for educational purposes only. Medical information should be verified with healthcare professionals.
|