Spaces:

algoryn
/

dots-ocr-idcard

Paused

App Files Files Community

dots-ocr-idcard / DEPLOYMENT.md

tommulder

Prepare for Hugging Face Spaces deployment

e300623 2 months ago

preview code

raw

history blame contribute delete

2.55 kB

Dots.OCR Service - Hugging Face Spaces Deployment Guide

✅ Ready for Deployment

The dots-ocr service is now fully self-contained and ready for deployment to Hugging Face Spaces.

Files Updated

app.py - Fixed import paths to be self-contained
models.py - Created local data structures (ExtractedField, IdCardFields, MRZData)
field_extraction.py - Created local field extraction module
Dockerfile - Updated for HF compliance with proper user permissions
README.md - Updated with proper HF Spaces configuration

Deployment Steps

1. Create Hugging Face Space

# Login to Hugging Face
huggingface-cli login

# Create a new Space
huggingface-cli repo create dots-ocr-idcard --type space --space_sdk docker --organization algoryn

2. Deploy to HF Space

# Clone the space locally
git clone https://huggingface.co/spaces/algoryn/dots-ocr-idcard
cd dots-ocr-idcard

# Copy all files from this repository
cp /Users/tmulder/Sources/Algoryn/kybtech-dots-ocr/* .

# Commit and push
git add .
git commit -m "Deploy Dots.OCR text extraction service"
git push

3. Test the Deployment

Once deployed (usually takes 5-10 minutes), test with:

# Basic OCR test
curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \
  -H "Authorization: Bearer YOUR_HF_TOKEN" \
  -F "file=@test_image.jpg"

# With ROI (region of interest)
curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \
  -H "Authorization: Bearer YOUR_HF_TOKEN" \
  -F "file=@test_image.jpg" \
  -F 'roi={"x1":0.1,"y1":0.1,"x2":0.9,"y2":0.9}'

Features

Self-contained: No external dependencies on parent repository
HF Compliant: Follows Hugging Face Docker Spaces best practices
Mock Mode: Falls back to mock implementation if Dots.OCR fails to load
ROI Support: Process pre-cropped images or full images with ROI coordinates
Field Extraction: Structured field extraction with confidence scores
MRZ Detection: Machine Readable Zone data extraction

API Endpoints

GET /health - Health check
POST /v1/id/ocr - Text extraction with optional ROI

Environment Variables

No special environment variables needed. The service runs on port 7860 by default.

Performance

GPU: 300-900ms processing time
CPU: 3-8s processing time
Memory: ~6GB per instance

Privacy

This endpoint processes images temporarily and does not store or log personal information. All field values are redacted in logs for privacy protection.