# DeepSeek OCR Integration

This document explains how to use the DeepSeek OCR integration in your RAG system.

## Features

- **Text Extraction**: Extract text from images using DeepSeek OCR
- **Grounding**: Locate specific text within images
- **Markdown Conversion**: Convert document images to markdown format
- **RAG Integration**: Query the RAG system with OCR-extracted text
- **Multi-language Support**: Supports over 50 languages

## API Endpoints

### 1. Extract Text from Image
```
POST /ocr/extract-text/
```
- **Input**: Image file (multipart/form-data)
- **Optional**: Custom prompt
- **Output**: Extracted text

### 2. Extract Text with Grounding
```
POST /ocr/extract-with-grounding/
```
- **Input**: Image file + target text (optional)
- **Output**: Text with location information

### 3. Convert to Markdown
```
POST /ocr/convert-to-markdown/
```
- **Input**: Document image
- **Output**: Markdown formatted text

### 4. Query with OCR Text
```
POST /ocr/query/
```
- **Input**: Query + conversation history + extracted text
- **Output**: RAG response enhanced with OCR text

## Frontend Usage

1. **Upload Image**: Click the "+" button in the input area
2. **Select Image**: Choose an image file from your device
3. **OCR Processing**: The system will automatically extract text
4. **Options**:
   - **Use Extracted Text**: Copy the text to the input field
   - **Query with OCR**: Ask questions about the image content
   - **Cancel**: Close the OCR modal

## Configuration

Create a `.env` file with the following variables:

```env
# DeepSeek OCR Configuration
DEEPSEEK_OCR_MODEL=deepseek-ai/DeepSeek-OCR
DEEPSEEK_OCR_DEVICE=auto  # auto, cpu, cuda
DEEPSEEK_OCR_MAX_TOKENS=512
DEEPSEEK_OCR_TEMPERATURE=0.1

# Optional: Custom model path for local models
# DEEPSEEK_OCR_MODEL_PATH=/path/to/local/model

# Optional: Hugging Face token for private models
# HF_TOKEN=your_huggingface_token_here
```

## Installation

1. Install dependencies:
```bash
pip install -r requirements.txt
```

2. Set up environment variables (optional):
```bash
cp .env.example .env
# Edit .env with your configuration
```

3. Run the application:
```bash
uvicorn main:app --reload
```

## Model Requirements

### For CPU (Laptop) Setup:
- **RAM**: At least 8GB (16GB recommended)
- **Storage**: ~2GB for model download
- **CPU**: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better)
- **Performance**: Expect 10-30 seconds per image on CPU

### For GPU Setup:
- **GPU**: CUDA compatible (NVIDIA)
- **VRAM**: At least 4GB
- **RAM**: 16GB+ recommended
- **Performance**: Expect 2-5 seconds per image on GPU

## Performance Tips

### For CPU (Laptop) Users:
1. **CPU Optimization**: Already configured for CPU usage
2. **Image Size**: Use images max 1024x1024 pixels for faster processing
3. **Memory Management**: Close other applications to free up RAM
4. **Model Caching**: The model is cached after first load
5. **Processing Time**: Expect 10-30 seconds per image on CPU

### For GPU Users:
1. **GPU Usage**: Set `DEEPSEEK_OCR_DEVICE=cuda` for GPU acceleration
2. **Batch Processing**: Process multiple images efficiently
3. **Memory Management**: Monitor GPU memory usage for large images

## Error Handling

The system includes comprehensive error handling:
- File type validation
- Model loading errors
- OCR processing failures
- Network connectivity issues

## Examples

### Basic Text Extraction
```python
import requests

# Upload image and extract text
with open('image.jpg', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/ocr/extract-text/',
        files={'file': f}
    )
    
result = response.json()
print(result['extracted_text'])
```

### Query with OCR
```python
# Query about extracted text
response = requests.post(
    'http://localhost:8000/ocr/query/',
    json={
        'query': 'What is the main topic?',
        'conversation_history': [],
        'extracted_text': 'Your extracted text here...'
    }
)
```

## Troubleshooting

### Common Issues

1. **Model Loading Error**: Ensure you have sufficient RAM/VRAM
2. **CUDA Error**: Check GPU compatibility and drivers
3. **Memory Error**: Reduce image size or use CPU mode
4. **Network Error**: Check internet connection for model download

### Debug Mode

Enable debug logging:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## Support

For issues and questions:
1. Check the logs for error messages
2. Verify your environment configuration
3. Test with smaller images first
4. Check GPU memory usage

## License

This integration uses DeepSeek OCR which is licensed under Apache 2.0.