medchat / README.md
vihashini-18
i
8a79799

A newer version of the Streamlit SDK is available: 1.51.0

Upgrade
metadata
title: Medical Chatbot
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - medical
  - chatbot
  - rag
  - gemini
  - streamlit

Medical Chatbot πŸ₯

An intelligent medical question-answering chatbot that uses retrieval-augmented generation (RAG) with Gemini 1.5 Flash, Sentence Transformers, and Pinecone DB.

Features

  • πŸ€– Powered by Gemini 1.5 Flash for natural language understanding
  • πŸ“Š Uses Sentence Transformers for semantic search
  • πŸ” Retrieves relevant medical information from vector database
  • πŸ“š Provides citations with source attribution
  • 🎯 Confidence scoring for each response
  • 🌐 Beautiful Streamlit interface
  • ⚠️ Important disclaimers for medical advice

Prerequisites

  1. Python 3.8 or higher
  2. Pinecone account (https://www.pinecone.io/)
  3. Google AI Studio API key (https://makersuite.google.com/app/apikey)
  4. Hugging Face account (optional, for accessing datasets)

Installation

For detailed step-by-step instructions, see QUICK_START.md

  1. Clone or download this repository

  2. Install dependencies:

pip install -r requirements.txt
  1. Create a .env file in the root directory:
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=us-east1
GOOGLE_API_KEY=your_google_api_key_here
  1. Set up the database:
python setup_database.py

This will download medical data from Hugging Face and upload it to Pinecone.

Usage

Run the Streamlit application:

streamlit run app.py

Open your browser to the URL shown (typically http://localhost:8501)

Quick Start Guide: QUICK_START.md

How It Works

  1. Data Loading: Medical questions and answers are loaded from Hugging Face datasets
  2. Embedding: Texts are converted to embeddings using Sentence Transformers
  3. Vector Storage: Embeddings are stored in Pinecone for fast similarity search
  4. Query Processing: User queries are embedded and searched against the database
  5. Response Generation: Gemini 1.5 Flash generates responses based on retrieved context
  6. Citation: Sources are tracked and displayed with confidence scores

Important Disclaimers

  • ⚠️ This is not medical advice
  • ⚠️ Not a substitute for professional healthcare
  • ⚠️ Always consult healthcare professionals for medical decisions
  • ⚠️ Confidence scores indicate data quality, not medical accuracy

Configuration

Edit config.py to customize:

  • Embedding model
  • Number of retrieved documents (TOP_K)
  • Similarity threshold
  • Dataset selection

Troubleshooting

"API Key not found"

  • Ensure your .env file exists and contains valid API keys

"Index not found"

  • Run python setup_database.py to create the Pinecone index

"No results found"

  • The similarity threshold might be too high
  • Adjust SIMILARITY_THRESHOLD in config.py

License

This project is for educational purposes only. Medical information should be verified with healthcare professionals.