# DeepSeek OCR Integration This document explains how to use the DeepSeek OCR integration in your RAG system. ## Features - **Text Extraction**: Extract text from images using DeepSeek OCR - **Grounding**: Locate specific text within images - **Markdown Conversion**: Convert document images to markdown format - **RAG Integration**: Query the RAG system with OCR-extracted text - **Multi-language Support**: Supports over 50 languages ## API Endpoints ### 1. Extract Text from Image ``` POST /ocr/extract-text/ ``` - **Input**: Image file (multipart/form-data) - **Optional**: Custom prompt - **Output**: Extracted text ### 2. Extract Text with Grounding ``` POST /ocr/extract-with-grounding/ ``` - **Input**: Image file + target text (optional) - **Output**: Text with location information ### 3. Convert to Markdown ``` POST /ocr/convert-to-markdown/ ``` - **Input**: Document image - **Output**: Markdown formatted text ### 4. Query with OCR Text ``` POST /ocr/query/ ``` - **Input**: Query + conversation history + extracted text - **Output**: RAG response enhanced with OCR text ## Frontend Usage 1. **Upload Image**: Click the "+" button in the input area 2. **Select Image**: Choose an image file from your device 3. **OCR Processing**: The system will automatically extract text 4. **Options**: - **Use Extracted Text**: Copy the text to the input field - **Query with OCR**: Ask questions about the image content - **Cancel**: Close the OCR modal ## Configuration Create a `.env` file with the following variables: ```env # DeepSeek OCR Configuration DEEPSEEK_OCR_MODEL=deepseek-ai/DeepSeek-OCR DEEPSEEK_OCR_DEVICE=auto # auto, cpu, cuda DEEPSEEK_OCR_MAX_TOKENS=512 DEEPSEEK_OCR_TEMPERATURE=0.1 # Optional: Custom model path for local models # DEEPSEEK_OCR_MODEL_PATH=/path/to/local/model # Optional: Hugging Face token for private models # HF_TOKEN=your_huggingface_token_here ``` ## Installation 1. Install dependencies: ```bash pip install -r requirements.txt ``` 2. Set up environment variables (optional): ```bash cp .env.example .env # Edit .env with your configuration ``` 3. Run the application: ```bash uvicorn main:app --reload ``` ## Model Requirements ### For CPU (Laptop) Setup: - **RAM**: At least 8GB (16GB recommended) - **Storage**: ~2GB for model download - **CPU**: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better) - **Performance**: Expect 10-30 seconds per image on CPU ### For GPU Setup: - **GPU**: CUDA compatible (NVIDIA) - **VRAM**: At least 4GB - **RAM**: 16GB+ recommended - **Performance**: Expect 2-5 seconds per image on GPU ## Performance Tips ### For CPU (Laptop) Users: 1. **CPU Optimization**: Already configured for CPU usage 2. **Image Size**: Use images max 1024x1024 pixels for faster processing 3. **Memory Management**: Close other applications to free up RAM 4. **Model Caching**: The model is cached after first load 5. **Processing Time**: Expect 10-30 seconds per image on CPU ### For GPU Users: 1. **GPU Usage**: Set `DEEPSEEK_OCR_DEVICE=cuda` for GPU acceleration 2. **Batch Processing**: Process multiple images efficiently 3. **Memory Management**: Monitor GPU memory usage for large images ## Error Handling The system includes comprehensive error handling: - File type validation - Model loading errors - OCR processing failures - Network connectivity issues ## Examples ### Basic Text Extraction ```python import requests # Upload image and extract text with open('image.jpg', 'rb') as f: response = requests.post( 'http://localhost:8000/ocr/extract-text/', files={'file': f} ) result = response.json() print(result['extracted_text']) ``` ### Query with OCR ```python # Query about extracted text response = requests.post( 'http://localhost:8000/ocr/query/', json={ 'query': 'What is the main topic?', 'conversation_history': [], 'extracted_text': 'Your extracted text here...' } ) ``` ## Troubleshooting ### Common Issues 1. **Model Loading Error**: Ensure you have sufficient RAM/VRAM 2. **CUDA Error**: Check GPU compatibility and drivers 3. **Memory Error**: Reduce image size or use CPU mode 4. **Network Error**: Check internet connection for model download ### Debug Mode Enable debug logging: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` ## Support For issues and questions: 1. Check the logs for error messages 2. Verify your environment configuration 3. Test with smaller images first 4. Check GPU memory usage ## License This integration uses DeepSeek OCR which is licensed under Apache 2.0.