yadavkapil23 commited on
Commit
fcfe360
·
1 Parent(s): d90bce4

updated req

Browse files
Files changed (2) hide show
  1. README_OCR.md +183 -0
  2. requirements.txt +10 -2
README_OCR.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepSeek OCR Integration
2
+
3
+ This document explains how to use the DeepSeek OCR integration in your RAG system.
4
+
5
+ ## Features
6
+
7
+ - **Text Extraction**: Extract text from images using DeepSeek OCR
8
+ - **Grounding**: Locate specific text within images
9
+ - **Markdown Conversion**: Convert document images to markdown format
10
+ - **RAG Integration**: Query the RAG system with OCR-extracted text
11
+ - **Multi-language Support**: Supports over 50 languages
12
+
13
+ ## API Endpoints
14
+
15
+ ### 1. Extract Text from Image
16
+ ```
17
+ POST /ocr/extract-text/
18
+ ```
19
+ - **Input**: Image file (multipart/form-data)
20
+ - **Optional**: Custom prompt
21
+ - **Output**: Extracted text
22
+
23
+ ### 2. Extract Text with Grounding
24
+ ```
25
+ POST /ocr/extract-with-grounding/
26
+ ```
27
+ - **Input**: Image file + target text (optional)
28
+ - **Output**: Text with location information
29
+
30
+ ### 3. Convert to Markdown
31
+ ```
32
+ POST /ocr/convert-to-markdown/
33
+ ```
34
+ - **Input**: Document image
35
+ - **Output**: Markdown formatted text
36
+
37
+ ### 4. Query with OCR Text
38
+ ```
39
+ POST /ocr/query/
40
+ ```
41
+ - **Input**: Query + conversation history + extracted text
42
+ - **Output**: RAG response enhanced with OCR text
43
+
44
+ ## Frontend Usage
45
+
46
+ 1. **Upload Image**: Click the "+" button in the input area
47
+ 2. **Select Image**: Choose an image file from your device
48
+ 3. **OCR Processing**: The system will automatically extract text
49
+ 4. **Options**:
50
+ - **Use Extracted Text**: Copy the text to the input field
51
+ - **Query with OCR**: Ask questions about the image content
52
+ - **Cancel**: Close the OCR modal
53
+
54
+ ## Configuration
55
+
56
+ Create a `.env` file with the following variables:
57
+
58
+ ```env
59
+ # DeepSeek OCR Configuration
60
+ DEEPSEEK_OCR_MODEL=deepseek-ai/DeepSeek-OCR
61
+ DEEPSEEK_OCR_DEVICE=auto # auto, cpu, cuda
62
+ DEEPSEEK_OCR_MAX_TOKENS=512
63
+ DEEPSEEK_OCR_TEMPERATURE=0.1
64
+
65
+ # Optional: Custom model path for local models
66
+ # DEEPSEEK_OCR_MODEL_PATH=/path/to/local/model
67
+
68
+ # Optional: Hugging Face token for private models
69
+ # HF_TOKEN=your_huggingface_token_here
70
+ ```
71
+
72
+ ## Installation
73
+
74
+ 1. Install dependencies:
75
+ ```bash
76
+ pip install -r requirements.txt
77
+ ```
78
+
79
+ 2. Set up environment variables (optional):
80
+ ```bash
81
+ cp .env.example .env
82
+ # Edit .env with your configuration
83
+ ```
84
+
85
+ 3. Run the application:
86
+ ```bash
87
+ uvicorn main:app --reload
88
+ ```
89
+
90
+ ## Model Requirements
91
+
92
+ ### For CPU (Laptop) Setup:
93
+ - **RAM**: At least 8GB (16GB recommended)
94
+ - **Storage**: ~2GB for model download
95
+ - **CPU**: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better)
96
+ - **Performance**: Expect 10-30 seconds per image on CPU
97
+
98
+ ### For GPU Setup:
99
+ - **GPU**: CUDA compatible (NVIDIA)
100
+ - **VRAM**: At least 4GB
101
+ - **RAM**: 16GB+ recommended
102
+ - **Performance**: Expect 2-5 seconds per image on GPU
103
+
104
+ ## Performance Tips
105
+
106
+ ### For CPU (Laptop) Users:
107
+ 1. **CPU Optimization**: Already configured for CPU usage
108
+ 2. **Image Size**: Use images max 1024x1024 pixels for faster processing
109
+ 3. **Memory Management**: Close other applications to free up RAM
110
+ 4. **Model Caching**: The model is cached after first load
111
+ 5. **Processing Time**: Expect 10-30 seconds per image on CPU
112
+
113
+ ### For GPU Users:
114
+ 1. **GPU Usage**: Set `DEEPSEEK_OCR_DEVICE=cuda` for GPU acceleration
115
+ 2. **Batch Processing**: Process multiple images efficiently
116
+ 3. **Memory Management**: Monitor GPU memory usage for large images
117
+
118
+ ## Error Handling
119
+
120
+ The system includes comprehensive error handling:
121
+ - File type validation
122
+ - Model loading errors
123
+ - OCR processing failures
124
+ - Network connectivity issues
125
+
126
+ ## Examples
127
+
128
+ ### Basic Text Extraction
129
+ ```python
130
+ import requests
131
+
132
+ # Upload image and extract text
133
+ with open('image.jpg', 'rb') as f:
134
+ response = requests.post(
135
+ 'http://localhost:8000/ocr/extract-text/',
136
+ files={'file': f}
137
+ )
138
+
139
+ result = response.json()
140
+ print(result['extracted_text'])
141
+ ```
142
+
143
+ ### Query with OCR
144
+ ```python
145
+ # Query about extracted text
146
+ response = requests.post(
147
+ 'http://localhost:8000/ocr/query/',
148
+ json={
149
+ 'query': 'What is the main topic?',
150
+ 'conversation_history': [],
151
+ 'extracted_text': 'Your extracted text here...'
152
+ }
153
+ )
154
+ ```
155
+
156
+ ## Troubleshooting
157
+
158
+ ### Common Issues
159
+
160
+ 1. **Model Loading Error**: Ensure you have sufficient RAM/VRAM
161
+ 2. **CUDA Error**: Check GPU compatibility and drivers
162
+ 3. **Memory Error**: Reduce image size or use CPU mode
163
+ 4. **Network Error**: Check internet connection for model download
164
+
165
+ ### Debug Mode
166
+
167
+ Enable debug logging:
168
+ ```python
169
+ import logging
170
+ logging.basicConfig(level=logging.DEBUG)
171
+ ```
172
+
173
+ ## Support
174
+
175
+ For issues and questions:
176
+ 1. Check the logs for error messages
177
+ 2. Verify your environment configuration
178
+ 3. Test with smaller images first
179
+ 4. Check GPU memory usage
180
+
181
+ ## License
182
+
183
+ This integration uses DeepSeek OCR which is licensed under Apache 2.0.
requirements.txt CHANGED
@@ -10,5 +10,13 @@ wikipedia
10
  pypdf
11
  sentence-transformers
12
  torch
13
- transformers
14
- accelerate
 
 
 
 
 
 
 
 
 
10
  pypdf
11
  sentence-transformers
12
  torch
13
+ transformers>=4.36.0
14
+ accelerate
15
+ Pillow
16
+ python-multipart
17
+ aiofiles
18
+ addict
19
+ einops
20
+ easydict
21
+ matplotlib
22
+ torchvision