Spaces:

yadavkapil23
/

Corex

Running

App Files Files Community

yadavkapil23 commited on 4 days ago

Commit

fcfe360

1 Parent(s): d90bce4

updated req

Browse files

Files changed (2) hide show

README_OCR.md +183 -0
requirements.txt +10 -2

README_OCR.md ADDED Viewed

	@@ -0,0 +1,183 @@

+# DeepSeek OCR Integration
+This document explains how to use the DeepSeek OCR integration in your RAG system.
+## Features
+- **Text Extraction**: Extract text from images using DeepSeek OCR
+- **Grounding**: Locate specific text within images
+- **Markdown Conversion**: Convert document images to markdown format
+- **RAG Integration**: Query the RAG system with OCR-extracted text
+- **Multi-language Support**: Supports over 50 languages
+## API Endpoints
+### 1. Extract Text from Image
+```
+POST /ocr/extract-text/
+```
+- **Input**: Image file (multipart/form-data)
+- **Optional**: Custom prompt
+- **Output**: Extracted text
+### 2. Extract Text with Grounding
+```
+POST /ocr/extract-with-grounding/
+```
+- **Input**: Image file + target text (optional)
+- **Output**: Text with location information
+### 3. Convert to Markdown
+```
+POST /ocr/convert-to-markdown/
+```
+- **Input**: Document image
+- **Output**: Markdown formatted text
+### 4. Query with OCR Text
+```
+POST /ocr/query/
+```
+- **Input**: Query + conversation history + extracted text
+- **Output**: RAG response enhanced with OCR text
+## Frontend Usage
+1. **Upload Image**: Click the "+" button in the input area
+2. **Select Image**: Choose an image file from your device
+3. **OCR Processing**: The system will automatically extract text
+4. **Options**:
+   - **Use Extracted Text**: Copy the text to the input field
+   - **Query with OCR**: Ask questions about the image content
+   - **Cancel**: Close the OCR modal
+## Configuration
+Create a `.env` file with the following variables:
+```env
+# DeepSeek OCR Configuration
+DEEPSEEK_OCR_MODEL=deepseek-ai/DeepSeek-OCR
+DEEPSEEK_OCR_DEVICE=auto  # auto, cpu, cuda
+DEEPSEEK_OCR_MAX_TOKENS=512
+DEEPSEEK_OCR_TEMPERATURE=0.1
+# Optional: Custom model path for local models
+# DEEPSEEK_OCR_MODEL_PATH=/path/to/local/model
+# Optional: Hugging Face token for private models
+# HF_TOKEN=your_huggingface_token_here
+```
+## Installation
+1. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+2. Set up environment variables (optional):
+```bash
+cp .env.example .env
+# Edit .env with your configuration
+```
+3. Run the application:
+```bash
+uvicorn main:app --reload
+```
+## Model Requirements
+### For CPU (Laptop) Setup:
+- **RAM**: At least 8GB (16GB recommended)
+- **Storage**: ~2GB for model download
+- **CPU**: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better)
+- **Performance**: Expect 10-30 seconds per image on CPU
+### For GPU Setup:
+- **GPU**: CUDA compatible (NVIDIA)
+- **VRAM**: At least 4GB
+- **RAM**: 16GB+ recommended
+- **Performance**: Expect 2-5 seconds per image on GPU
+## Performance Tips
+### For CPU (Laptop) Users:
+1. **CPU Optimization**: Already configured for CPU usage
+2. **Image Size**: Use images max 1024x1024 pixels for faster processing
+3. **Memory Management**: Close other applications to free up RAM
+4. **Model Caching**: The model is cached after first load
+5. **Processing Time**: Expect 10-30 seconds per image on CPU
+### For GPU Users:
+1. **GPU Usage**: Set `DEEPSEEK_OCR_DEVICE=cuda` for GPU acceleration
+2. **Batch Processing**: Process multiple images efficiently
+3. **Memory Management**: Monitor GPU memory usage for large images
+## Error Handling
+The system includes comprehensive error handling:
+- File type validation
+- Model loading errors
+- OCR processing failures
+- Network connectivity issues
+## Examples
+### Basic Text Extraction
+```python
+import requests
+# Upload image and extract text
+with open('image.jpg', 'rb') as f:
+    response = requests.post(
+        'http://localhost:8000/ocr/extract-text/',
+        files={'file': f}
+    )
+result = response.json()
+print(result['extracted_text'])
+```
+### Query with OCR
+```python
+# Query about extracted text
+response = requests.post(
+    'http://localhost:8000/ocr/query/',
+    json={
+        'query': 'What is the main topic?',
+        'conversation_history': [],
+        'extracted_text': 'Your extracted text here...'
+    }
+)
+```
+## Troubleshooting
+### Common Issues
+1. **Model Loading Error**: Ensure you have sufficient RAM/VRAM
+2. **CUDA Error**: Check GPU compatibility and drivers
+3. **Memory Error**: Reduce image size or use CPU mode
+4. **Network Error**: Check internet connection for model download
+### Debug Mode
+Enable debug logging:
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+```
+## Support
+For issues and questions:
+1. Check the logs for error messages
+2. Verify your environment configuration
+3. Test with smaller images first
+4. Check GPU memory usage
+## License
+This integration uses DeepSeek OCR which is licensed under Apache 2.0.

requirements.txt CHANGED Viewed

@@ -10,5 +10,13 @@ wikipedia
 pypdf
 sentence-transformers
 torch
-transformers
-accelerate

 pypdf
 sentence-transformers
 torch
+transformers>=4.36.0
+accelerate
+Pillow
+python-multipart
+aiofiles
+addict
+einops
+easydict
+matplotlib
+torchvision