Spaces:

yadavkapil23
/

Corex

Running

App Files Files Community

Corex / README_OCR.md

yadavkapil23

updated req

fcfe360 6 days ago

preview code

raw

history blame contribute delete

4.62 kB

DeepSeek OCR Integration

This document explains how to use the DeepSeek OCR integration in your RAG system.

Features

Text Extraction: Extract text from images using DeepSeek OCR
Grounding: Locate specific text within images
Markdown Conversion: Convert document images to markdown format
RAG Integration: Query the RAG system with OCR-extracted text
Multi-language Support: Supports over 50 languages

API Endpoints

1. Extract Text from Image

POST /ocr/extract-text/

Input: Image file (multipart/form-data)
Optional: Custom prompt
Output: Extracted text

2. Extract Text with Grounding

POST /ocr/extract-with-grounding/

Input: Image file + target text (optional)
Output: Text with location information

3. Convert to Markdown

POST /ocr/convert-to-markdown/

Input: Document image
Output: Markdown formatted text

4. Query with OCR Text

POST /ocr/query/

Input: Query + conversation history + extracted text
Output: RAG response enhanced with OCR text

Frontend Usage

Upload Image: Click the "+" button in the input area
Select Image: Choose an image file from your device
OCR Processing: The system will automatically extract text
Options:
- Use Extracted Text: Copy the text to the input field
- Query with OCR: Ask questions about the image content
- Cancel: Close the OCR modal

Configuration

Create a .env file with the following variables:

# DeepSeek OCR Configuration
DEEPSEEK_OCR_MODEL=deepseek-ai/DeepSeek-OCR
DEEPSEEK_OCR_DEVICE=auto  # auto, cpu, cuda
DEEPSEEK_OCR_MAX_TOKENS=512
DEEPSEEK_OCR_TEMPERATURE=0.1

# Optional: Custom model path for local models
# DEEPSEEK_OCR_MODEL_PATH=/path/to/local/model

# Optional: Hugging Face token for private models
# HF_TOKEN=your_huggingface_token_here

Installation

Install dependencies:

pip install -r requirements.txt

Set up environment variables (optional):

cp .env.example .env
# Edit .env with your configuration

Run the application:

uvicorn main:app --reload

Model Requirements

For CPU (Laptop) Setup:

RAM: At least 8GB (16GB recommended)
Storage: ~2GB for model download
CPU: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better)
Performance: Expect 10-30 seconds per image on CPU

For GPU Setup:

GPU: CUDA compatible (NVIDIA)
VRAM: At least 4GB
RAM: 16GB+ recommended
Performance: Expect 2-5 seconds per image on GPU

Performance Tips

For CPU (Laptop) Users:

CPU Optimization: Already configured for CPU usage
Image Size: Use images max 1024x1024 pixels for faster processing
Memory Management: Close other applications to free up RAM
Model Caching: The model is cached after first load
Processing Time: Expect 10-30 seconds per image on CPU

For GPU Users:

GPU Usage: Set DEEPSEEK_OCR_DEVICE=cuda for GPU acceleration
Batch Processing: Process multiple images efficiently
Memory Management: Monitor GPU memory usage for large images

Error Handling

The system includes comprehensive error handling:

File type validation
Model loading errors
OCR processing failures
Network connectivity issues

Examples

Basic Text Extraction

import requests

# Upload image and extract text
with open('image.jpg', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/ocr/extract-text/',
        files={'file': f}
    )
    
result = response.json()
print(result['extracted_text'])

Query with OCR

# Query about extracted text
response = requests.post(
    'http://localhost:8000/ocr/query/',
    json={
        'query': 'What is the main topic?',
        'conversation_history': [],
        'extracted_text': 'Your extracted text here...'
    }
)

Troubleshooting

Common Issues

Model Loading Error: Ensure you have sufficient RAM/VRAM
CUDA Error: Check GPU compatibility and drivers
Memory Error: Reduce image size or use CPU mode
Network Error: Check internet connection for model download

Debug Mode

Enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Support

For issues and questions:

Check the logs for error messages
Verify your environment configuration
Test with smaller images first
Check GPU memory usage

License

This integration uses DeepSeek OCR which is licensed under Apache 2.0.