Spaces:

vesakkivignesh
/

medchat

Running

App Files Files Community

medchat / README.md

vihashini-18

8a79799 12 days ago

preview code

raw

history blame contribute delete

3.12 kB

A newer version of the Streamlit SDK is available: 1.51.0

Upgrade

metadata

title: Medical Chatbot
emoji: 🏥
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - medical
  - chatbot
  - rag
  - gemini
  - streamlit

Medical Chatbot 🏥

An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.

Features

🤖 Powered by Gemini 1.5 Flash for natural language understanding
📊 Uses Sentence Transformers for semantic search
🔍 Retrieves relevant medical information from vector database
📚 Provides citations with source attribution
🎯 Confidence scoring for each response
🌐 Beautiful Streamlit interface
⚠️ Important disclaimers for medical advice

Prerequisites

Python 3.8 or higher
Pinecone account (https://www.pinecone.io/)
Google AI Studio API key (https://makersuite.google.com/app/apikey)
Hugging Face account (optional, for accessing datasets)

Installation

For detailed step-by-step instructions, see QUICK_START.md

Clone or download this repository
Install dependencies:

pip install -r requirements.txt

Create a .env file in the root directory:

PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=us-east1
GOOGLE_API_KEY=your_google_api_key_here

Set up the database:

python setup_database.py

This will download medical data from Hugging Face and upload it to Pinecone.

Usage

Run the Streamlit application:

streamlit run app.py

Open your browser to the URL shown (typically http://localhost:8501)

Quick Start Guide: QUICK_START.md

How It Works

Data Loading: Medical questions and answers are loaded from Hugging Face datasets
Embedding: Texts are converted to embeddings using Sentence Transformers
Vector Storage: Embeddings are stored in Pinecone for fast similarity search
Query Processing: User queries are embedded and searched against the database
Response Generation: Gemini 1.5 Flash generates responses based on retrieved context
Citation: Sources are tracked and displayed with confidence scores

Important Disclaimers

⚠️ This is not medical advice
⚠️ Not a substitute for professional healthcare
⚠️ Always consult healthcare professionals for medical decisions
⚠️ Confidence scores indicate data quality, not medical accuracy

Configuration

Edit config.py to customize:

Embedding model
Number of retrieved documents (TOP_K)
Similarity threshold
Dataset selection

Troubleshooting

"API Key not found"

Ensure your .env file exists and contains valid API keys

"Index not found"

Run python setup_database.py to create the Pinecone index

"No results found"

The similarity threshold might be too high
Adjust SIMILARITY_THRESHOLD in config.py

License

This project is for educational purposes only. Medical information should be verified with healthcare professionals.