title: CogniChat - Chat with Your Documents
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
π€ CogniChat - Intelligent Document Chat System
Transform your documents into interactive conversations powered by advanced RAG technology
   
Features β’ Quick Start β’ Architecture β’ Deployment β’ API
π Table of Contents
- Overview
- Features
- Architecture
- Technology Stack
- Quick Start
- Deployment
- Configuration
- API Reference
- Troubleshooting
- Contributing
- License
π― Overview
CogniChat is a production-ready, intelligent document chat application that leverages Retrieval Augmented Generation (RAG) to enable natural conversations with your documents. Built with enterprise-grade technologies, it provides accurate, context-aware responses from your document corpus.
Why CogniChat?
- π Audio Overview of Your document:Simply ask the question and listen the audio. Now your document can speak with you.
- π― Accurate Retrieval: Hybrid search combining BM25 and FAISS for optimal results
- π¬ Conversational Memory: Maintains context across multiple interactions
- π Multi-Format Support: Handles PDF, DOCX, TXT, and image files
- π Production Ready: Docker support, comprehensive error handling, and security best practices
- π¨ Modern UI: Responsive design with dark mode and real-time streaming
β¨ Features
Core Capabilities
| Feature | Description | 
|---|---|
| Multi-Format Processing | Upload and process PDF, DOCX, TXT, and image files | 
| Hybrid Search | Combines BM25 (keyword) and FAISS (semantic) for superior retrieval | 
| Conversational AI | Powered by Groq's Llama 3.1 for intelligent responses | 
| Memory Management | Maintains chat history for contextual conversations | 
| Text-to-Speech | Built-in TTS for audio playback of responses | 
| Streaming Responses | Real-time token streaming for better UX | 
| Document Chunking | Intelligent text splitting for optimal context windows | 
Advanced Features
- Semantic Embeddings: HuggingFace all-miniLM-L6-v2for accurate vector representations
- Reranking: Contextual compression for improved relevance
- Error Handling: Comprehensive fallback mechanisms and error recovery
- Security: Non-root Docker execution and environment-based secrets
- Scalability: Optimized for both local and cloud deployments
π Architecture
RAG Pipeline Overview
graph TB
    A[Document Upload] --> B[Document Processing]
    B --> C[Text Extraction]
    C --> D[Chunking Strategy]
    D --> E[Embedding Generation]
    E --> F[Vector Store FAISS]
    
    G[User Query] --> H[Query Embedding]
    H --> I[Hybrid Retrieval]
    
    F --> I
    J[BM25 Index] --> I
    
    I --> K[Reranking]
    K --> L[Context Assembly]
    L --> M[LLM Groq Llama 3.1]
    M --> N[Response Generation]
    N --> O[Streaming Output]
    
    P[Chat History] --> M
    N --> P
    
    style A fill:#e1f5ff
    style G fill:#e1f5ff
    style F fill:#ffe1f5
    style J fill:#ffe1f5
    style M fill:#f5e1ff
    style O fill:#e1ffe1
System Architecture
graph LR
    A[Client Browser] -->|HTTP/WebSocket| B[Flask Server]
    B --> C[Document Processor]
    B --> D[RAG Engine]
    B --> E[TTS Service]
    
    C --> F[(File Storage)]
    D --> G[(FAISS Vector DB)]
    D --> H[(BM25 Index)]
    D --> I[Groq API]
    
    J[HuggingFace Models] --> D
    
    style B fill:#4a90e2
    style D fill:#e24a90
    style I fill:#90e24a
Data Flow
- Document Ingestion: Files are uploaded and validated
- Processing Pipeline: Text extraction β Chunking β Embedding
- Indexing: Dual indexing (FAISS + BM25) for hybrid search
- Query Processing: User queries are embedded and searched
- Retrieval: Top-k relevant chunks retrieved using hybrid approach
- Generation: LLM generates contextual responses with citations
- Streaming: Responses streamed back to client in real-time
π Technology Stack
Backend
| Component | Technology | Purpose | 
|---|---|---|
| Framework | Flask 2.3+ | Web application framework | 
| RAG | LangChain | RAG pipeline orchestration | 
| Vector DB | FAISS | Fast similarity search | 
| Keyword Search | BM25 | Sparse retrieval | 
| LLM | Groq Llama 3.1 | Response generation | 
| Embeddings | HuggingFace Transformers | Semantic embeddings | 
| Doc Processing | Unstructured, PyPDF, python-docx | Multi-format parsing | 
Frontend
| Component | Technology | 
|---|---|
| UI Framework | TailwindCSS | 
| JavaScript | Vanilla ES6+ | 
| Icons | Font Awesome | 
| Markdown | Marked.js | 
Infrastructure
- Containerization: Docker + Docker Compose
- Deployment: HuggingFace Spaces, local, cloud-agnostic
- Security: Environment-based secrets, non-root execution
π Quick Start
Prerequisites
- Python 3.9+
- Docker (optional, recommended)
- Groq API Key (Get one here)
Installation Methods
π³ Method 1: Docker (Recommended)
# Clone the repository
git clone https://github.com/RautRitesh/Chat-with-docs
cd cognichat
# Create environment file
cp .env.example .env
# Add your Groq API key to .env
echo "GROQ_API_KEY=your_actual_api_key_here" >> .env
# Build and run with Docker Compose
docker-compose up -d
# Or build manually
docker build -t cognichat .
docker run -p 7860:7860 --env-file .env cognichat
π Method 2: Local Python Environment
# Clone the repository
git clone https://github.com/RautRitesh/Chat-with-docs
cd cognichat
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export GROQ_API_KEY=your_actual_api_key_here
# Run the application
python app.py
π€ Method 3: HuggingFace Spaces
- Fork this repository
- Create a new Space on HuggingFace
- Link your forked repository
- Add GROQ_API_KEYin Settings β Repository Secrets
- Space will auto-deploy!
First Steps
- Open http://localhost:7860in your browser
- Upload a document (PDF, DOCX, TXT, or image)
- Wait for processing (progress indicator will show status)
- Start chatting with your document!
- Use the π button to hear responses via TTS
π¦ Deployment
Environment Variables
Create a .env file with the following variables:
# Required
GROQ_API_KEY=your_groq_api_key_here
# Optional
PORT=7860
HF_HOME=/tmp/huggingface_cache  # For HF Spaces
FLASK_DEBUG=0  # Set to 1 for development
MAX_UPLOAD_SIZE=10485760  # 10MB default
Docker Deployment
# Production build
docker build -t cognichat:latest .
# Run with resource limits
docker run -d \
  --name cognichat \
  -p 7860:7860 \
  --env-file .env \
  --memory="2g" \
  --cpus="1.5" \
  cognichat:latest
Docker Compose
version: '3.8'
services:
  cognichat:
    build: .
    ports:
      - "7860:7860"
    environment:
      - GROQ_API_KEY=${GROQ_API_KEY}
    volumes:
      - ./data:/app/data
    restart: unless-stopped
HuggingFace Spaces Configuration
Add these files to your repository:
app_port in README.md header:
app_port: 7860
Repository Secrets:
- GROQ_API_KEY: Your Groq API key
The application automatically detects HF Spaces environment and adjusts paths accordingly.
βοΈ Configuration
Document Processing Settings
# In app.py - Customize these settings
CHUNK_SIZE = 1000  # Characters per chunk
CHUNK_OVERLAP = 200  # Overlap between chunks
EMBEDDING_MODEL = "sentence-transformers/all-miniLM-L6-v2"
RETRIEVER_K = 5  # Number of chunks to retrieve
Model Configuration
# LLM Settings
LLM_PROVIDER = "groq"
MODEL_NAME = "llama-3.1-70b-versatile"
TEMPERATURE = 0.7
MAX_TOKENS = 2048
Search Configuration
# Hybrid Search Weights
FAISS_WEIGHT = 0.6  # Semantic search weight
BM25_WEIGHT = 0.4   # Keyword search weight
π API Reference
Endpoints
Upload Document
POST /upload
Content-Type: multipart/form-data
{
  "file": <binary>
}
Response:
{
  "status": "success",
  "message": "Document processed successfully",
  "filename": "example.pdf",
  "chunks": 45
}
Chat
POST /chat
Content-Type: application/json
{
  "message": "What is the main topic?",
  "stream": true
}
Response (Streaming):
data: {"token": "The", "done": false}
data: {"token": " main", "done": false}
data: {"token": " topic", "done": false}
data: {"done": true}
Clear Session
POST /clear
Response:
{
  "status": "success",
  "message": "Session cleared"
}
π§ Troubleshooting
Common Issues
1. Permission Errors in Docker
Problem: Permission denied when writing to cache directories
Solution:
# Rebuild with proper permissions
docker build --no-cache -t cognichat .
# Or run with volume permissions
docker run -v $(pwd)/cache:/tmp/huggingface_cache \
  --user $(id -u):$(id -g) \
  cognichat
2. Model Loading Fails
Problem: Cannot download HuggingFace models
Solution:
# Pre-download models
python test_embeddings.py
# Or use HF_HOME environment variable
export HF_HOME=/path/to/writable/directory
3. Chat Returns 400 Error
Problem: Upload directory not writable (common in HF Spaces)
Solution: Application now automatically uses /tmp/uploads in HF Spaces environment. Ensure latest version is deployed.
4. API Key Invalid
Problem: Groq API returns authentication error
Solution:
- Verify key at Groq Console
- Check .envfile has correct format:GROQ_API_KEY=gsk_...
- Restart application after updating key
Debug Mode
Enable detailed logging:
export FLASK_DEBUG=1
export LANGCHAIN_VERBOSE=true
python app.py
π§ͺ Testing
# Run test suite
pytest tests/
# Test embedding model
python test_embeddings.py
# Test document processing
pytest tests/test_document_processor.py
# Integration tests
pytest tests/test_integration.py
π€ Contributing
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
Development Guidelines
- Follow PEP 8 style guide
- Add tests for new features
- Update documentation
- Ensure Docker build succeeds
π Changelog
Version 2.0 (October 2025)
β Major Improvements:
- Fixed Docker permission issues
- HuggingFace Spaces compatibility
- Enhanced error handling
- Multiple model loading fallbacks
- Improved security (non-root execution)
β Bug Fixes:
- Upload directory write permissions
- Cache directory access
- Model initialization reliability
Version 1.0 (Initial Release)
- Basic RAG functionality
- PDF and DOCX support
- FAISS vector store
- Conversational memory
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- LangChain for RAG framework
- Groq for high-speed LLM inference
- HuggingFace for embeddings and hosting
- FAISS for efficient vector search
π Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: riteshraut123321@gmail.com
Made with β€οΈ by the CogniChat Team
β Star us on GitHub β’ π Report Bug β’ β¨ Request Feature