Spaces:
Running
Running
Commit
Β·
34cedd8
1
Parent(s):
1e32a60
Add support for reasoning trace display from NuMarkdown-8B-Thinking model
Browse files- Created ReasoningParser module to detect and parse <think>/<answer> tags
- Added collapsible reasoning panel UI with formatted step display
- Automatically separates reasoning from final output for cleaner view
- Shows reasoning statistics (word count, percentage of output)
- Added india-medical-ocr-test dataset to examples
- Styled reasoning sections with dark mode support
- Includes reasoning trace indicator badge in statistics panel
π€ Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- CLAUDE.md +89 -2
- css/styles.css +49 -0
- index.html +68 -2
- js/app.js +65 -3
- js/reasoning-parser.js +224 -0
- linkedin-post.txt +18 -0
- mobile-enhancement-plan.md +237 -0
- multi-ocr-comparison-ui-patterns.md +277 -0
CLAUDE.md
CHANGED
|
@@ -6,6 +6,32 @@ This file provides guidance to Claude Code (claude.ai/code) when working with th
|
|
| 6 |
|
| 7 |
OCR Text Explorer is a modern, standalone web application for browsing and comparing OCR text improvements in HuggingFace datasets. Built as a lightweight alternative to the Gradio-based OCR Time Machine, it focuses specifically on exploring pre-OCR'd datasets with enhanced user experience.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Architecture
|
| 10 |
|
| 11 |
### Technology Stack
|
|
@@ -123,6 +149,23 @@ case 'your_key':
|
|
| 123 |
// Dark mode: bg-red-950, text-red-300
|
| 124 |
```
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
## Performance Optimizations
|
| 127 |
|
| 128 |
1. **Direct Dataset Indexing**: Uses `dataset[index]` instead of loading batches into memory
|
|
@@ -146,8 +189,33 @@ case 'your_key':
|
|
| 146 |
**Cause**: Signed URLs expire after ~1 hour
|
| 147 |
**Fix**: Implemented handleImageError() with automatic URL refresh
|
| 148 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
## Future Enhancements
|
| 150 |
|
|
|
|
| 151 |
- [ ] Search/filter within dataset
|
| 152 |
- [ ] Bookmark favorite samples
|
| 153 |
- [ ] Export selected texts
|
|
@@ -178,9 +246,28 @@ npx serve .
|
|
| 178 |
## Testing Datasets
|
| 179 |
|
| 180 |
Known working datasets:
|
| 181 |
-
- `davanstrien/exams-ocr` - Default dataset with
|
|
|
|
| 182 |
- Any dataset with image + text columns
|
| 183 |
|
| 184 |
Column patterns automatically detected:
|
| 185 |
- Original: `text`, `ocr`, `original_text`, `ground_truth`
|
| 186 |
-
- Improved: `markdown`, `new_ocr`, `corrected_text`, `vlm_ocr`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
OCR Text Explorer is a modern, standalone web application for browsing and comparing OCR text improvements in HuggingFace datasets. Built as a lightweight alternative to the Gradio-based OCR Time Machine, it focuses specifically on exploring pre-OCR'd datasets with enhanced user experience.
|
| 8 |
|
| 9 |
+
## Recent Updates
|
| 10 |
+
|
| 11 |
+
### Markdown Rendering Support (Added 2025-08-01)
|
| 12 |
+
|
| 13 |
+
The application now supports rendering markdown-formatted VLM output for improved readability:
|
| 14 |
+
|
| 15 |
+
**Features:**
|
| 16 |
+
- Automatic markdown detection in improved OCR text
|
| 17 |
+
- Toggle button to switch between raw markdown and rendered view
|
| 18 |
+
- Support for common markdown elements: headers, lists, tables, code blocks, links
|
| 19 |
+
- Security-focused implementation with XSS prevention
|
| 20 |
+
- Performance optimization with render caching
|
| 21 |
+
|
| 22 |
+
**Implementation Details:**
|
| 23 |
+
- Uses marked.js library for markdown parsing
|
| 24 |
+
- Custom renderers for security (sanitizes URLs, prevents script injection)
|
| 25 |
+
- Tailwind-styled markdown elements matching the app's design
|
| 26 |
+
- HTML table support for VLM outputs that use table tags
|
| 27 |
+
- Cache system limits memory usage to 50 rendered items
|
| 28 |
+
|
| 29 |
+
**UI Changes:**
|
| 30 |
+
- Markdown toggle button appears when markdown is detected
|
| 31 |
+
- "Markdown Detected" badge in statistics panel
|
| 32 |
+
- New "Markdown Diff" mode showing plain vs rendered comparison
|
| 33 |
+
- Both "Improved Only" and "Side by Side" views support rendering
|
| 34 |
+
|
| 35 |
## Architecture
|
| 36 |
|
| 37 |
### Technology Stack
|
|
|
|
| 149 |
// Dark mode: bg-red-950, text-red-300
|
| 150 |
```
|
| 151 |
|
| 152 |
+
### Working with Markdown Rendering
|
| 153 |
+
```javascript
|
| 154 |
+
// Enable/disable markdown rendering
|
| 155 |
+
this.renderMarkdown = true; // Toggle markdown rendering
|
| 156 |
+
|
| 157 |
+
// Add new markdown patterns to detection
|
| 158 |
+
// In app.js detectMarkdown() method
|
| 159 |
+
const markdownPatterns = [
|
| 160 |
+
/your_pattern_here/, // Add your pattern
|
| 161 |
+
// ... existing patterns
|
| 162 |
+
];
|
| 163 |
+
|
| 164 |
+
// Customize markdown styles
|
| 165 |
+
// In app.js renderMarkdownText() method
|
| 166 |
+
html = html.replace(/<your_element>/g, '<your_element class="your-tailwind-classes">');
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
## Performance Optimizations
|
| 170 |
|
| 171 |
1. **Direct Dataset Indexing**: Uses `dataset[index]` instead of loading batches into memory
|
|
|
|
| 189 |
**Cause**: Signed URLs expire after ~1 hour
|
| 190 |
**Fix**: Implemented handleImageError() with automatic URL refresh
|
| 191 |
|
| 192 |
+
### Issue: Markdown tables not rendering
|
| 193 |
+
**Cause**: Default marked.js settings and HTML security restrictions
|
| 194 |
+
**Fix**:
|
| 195 |
+
- Enabled `tables: true` in marked.js options
|
| 196 |
+
- Added safe HTML table tag allowlist in renderer
|
| 197 |
+
- Applied proper Tailwind CSS classes to table elements
|
| 198 |
+
- Added CSS overrides for prose container compatibility
|
| 199 |
+
|
| 200 |
+
## Mobile Support Status
|
| 201 |
+
|
| 202 |
+
While the application claims responsive design, the current mobile support is limited. A comprehensive mobile enhancement is planned but not yet implemented. See [mobile-enhancement-plan.md](mobile-enhancement-plan.md) for detailed technical requirements and implementation approach.
|
| 203 |
+
|
| 204 |
+
**Current limitations:**
|
| 205 |
+
- Fixed desktop layout doesn't adapt well to small screens
|
| 206 |
+
- No touch gesture support for navigation
|
| 207 |
+
- Small touch targets for buttons and inputs
|
| 208 |
+
- Desktop-only interactions (hover states, keyboard shortcuts)
|
| 209 |
+
|
| 210 |
+
**Planned improvements:**
|
| 211 |
+
- Responsive stacked layout for mobile devices
|
| 212 |
+
- Touch gestures (swipe for navigation)
|
| 213 |
+
- Mobile-optimized navigation bar
|
| 214 |
+
- Touch-friendly UI components
|
| 215 |
+
|
| 216 |
## Future Enhancements
|
| 217 |
|
| 218 |
+
- [ ] Comprehensive mobile support (see mobile-enhancement-plan.md)
|
| 219 |
- [ ] Search/filter within dataset
|
| 220 |
- [ ] Bookmark favorite samples
|
| 221 |
- [ ] Export selected texts
|
|
|
|
| 246 |
## Testing Datasets
|
| 247 |
|
| 248 |
Known working datasets:
|
| 249 |
+
- `davanstrien/exams-ocr` - Default dataset with exam papers (uses `text` and `markdown` columns)
|
| 250 |
+
- `davanstrien/rolm-test` - Victorian theatre playbills processed with RolmOCR (uses `text` and `rolmocr_text` columns, includes `inference_info` metadata)
|
| 251 |
- Any dataset with image + text columns
|
| 252 |
|
| 253 |
Column patterns automatically detected:
|
| 254 |
- Original: `text`, `ocr`, `original_text`, `ground_truth`
|
| 255 |
+
- Improved: `markdown`, `new_ocr`, `corrected_text`, `vlm_ocr`, `rolmocr_text`
|
| 256 |
+
- Metadata: `inference_info` (JSON array with model details, processing date, parameters)
|
| 257 |
+
|
| 258 |
+
## Recent Updates
|
| 259 |
+
|
| 260 |
+
### Model Information Display (Added 2025-08-04)
|
| 261 |
+
|
| 262 |
+
The application now displays model processing information when available:
|
| 263 |
+
|
| 264 |
+
**Features:**
|
| 265 |
+
- Automatic detection of `inference_info` column
|
| 266 |
+
- Model metadata panel showing: model name, processing date, batch size, max tokens
|
| 267 |
+
- Link to processing script when available
|
| 268 |
+
- Positioned prominently below image for immediate visibility
|
| 269 |
+
|
| 270 |
+
**Implementation Notes:**
|
| 271 |
+
- The model info panel only appears when `inference_info` column exists
|
| 272 |
+
- Supports datasets processed with UV scripts via HF Jobs
|
| 273 |
+
- Gracefully handles datasets without model metadata
|
css/styles.css
CHANGED
|
@@ -48,6 +48,55 @@ body {
|
|
| 48 |
word-break: break-word;
|
| 49 |
}
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
/* Keyboard hint styling */
|
| 52 |
kbd {
|
| 53 |
@apply inline-block px-2 py-1 text-xs font-semibold text-gray-800 bg-gray-100 border border-gray-300 rounded dark:bg-gray-700 dark:text-gray-200 dark:border-gray-600;
|
|
|
|
| 48 |
word-break: break-word;
|
| 49 |
}
|
| 50 |
|
| 51 |
+
/* Reasoning trace styling */
|
| 52 |
+
.reasoning-panel {
|
| 53 |
+
@apply bg-gradient-to-r from-blue-50 to-indigo-50 dark:from-blue-950/20 dark:to-indigo-950/20;
|
| 54 |
+
@apply border-l-4 border-blue-500 dark:border-blue-400;
|
| 55 |
+
}
|
| 56 |
+
|
| 57 |
+
.reasoning-step {
|
| 58 |
+
@apply transition-all hover:bg-gray-50 dark:hover:bg-gray-800/50 rounded-md p-2 -m-2;
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
.reasoning-step-number {
|
| 62 |
+
@apply inline-flex items-center justify-center w-7 h-7;
|
| 63 |
+
@apply bg-gradient-to-br from-blue-500 to-indigo-600;
|
| 64 |
+
@apply text-white text-xs font-bold rounded-full;
|
| 65 |
+
@apply shadow-sm;
|
| 66 |
+
}
|
| 67 |
+
|
| 68 |
+
.reasoning-step-title {
|
| 69 |
+
@apply font-semibold text-gray-900 dark:text-gray-100;
|
| 70 |
+
@apply border-b border-gray-200 dark:border-gray-700 pb-1 mb-2;
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
.reasoning-step-content {
|
| 74 |
+
@apply text-sm text-gray-700 dark:text-gray-300;
|
| 75 |
+
@apply leading-relaxed;
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
/* Collapse animation for reasoning panel */
|
| 79 |
+
[x-collapse] {
|
| 80 |
+
overflow: hidden;
|
| 81 |
+
transition: max-height 0.3s ease-out;
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
[x-collapse].collapsed {
|
| 85 |
+
max-height: 0;
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
/* Reasoning trace indicators */
|
| 89 |
+
.reasoning-indicator {
|
| 90 |
+
@apply animate-pulse;
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
.reasoning-badge {
|
| 94 |
+
@apply inline-flex items-center px-3 py-1 rounded-full text-xs font-medium;
|
| 95 |
+
@apply bg-gradient-to-r from-blue-100 to-indigo-100 dark:from-blue-900 dark:to-indigo-900;
|
| 96 |
+
@apply text-blue-800 dark:text-blue-200;
|
| 97 |
+
@apply border border-blue-200 dark:border-blue-700;
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
/* Keyboard hint styling */
|
| 101 |
kbd {
|
| 102 |
@apply inline-block px-2 py-1 text-xs font-semibold text-gray-800 bg-gray-100 border border-gray-300 rounded dark:bg-gray-700 dark:text-gray-200 dark:border-gray-600;
|
index.html
CHANGED
|
@@ -314,13 +314,19 @@
|
|
| 314 |
<span x-text="wordStats.original || '-'"></span> β <span x-text="wordStats.improved || '-'"></span>
|
| 315 |
</span>
|
| 316 |
</div>
|
| 317 |
-
<div
|
| 318 |
-
<span class="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium bg-purple-100 dark:bg-purple-900 text-purple-800 dark:text-purple-200">
|
| 319 |
<svg class="w-3 h-3 mr-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
| 320 |
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"></path>
|
| 321 |
</svg>
|
| 322 |
Markdown Detected
|
| 323 |
</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 324 |
</div>
|
| 325 |
</div>
|
| 326 |
</div>
|
|
@@ -390,6 +396,65 @@
|
|
| 390 |
|
| 391 |
<!-- Improved Only -->
|
| 392 |
<div x-show="activeTab === 'improved'" class="max-w-none">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 393 |
<div x-show="!renderMarkdown">
|
| 394 |
<pre class="whitespace-pre-wrap font-mono text-xs bg-gray-50 dark:bg-gray-800 text-gray-900 dark:text-gray-100 p-4 rounded-lg" x-text="getImprovedText()"></pre>
|
| 395 |
</div>
|
|
@@ -532,6 +597,7 @@
|
|
| 532 |
<!-- Local Scripts -->
|
| 533 |
<script src="js/diff-utils.js"></script>
|
| 534 |
<script src="js/dataset-api.js"></script>
|
|
|
|
| 535 |
<script src="js/app.js"></script>
|
| 536 |
</body>
|
| 537 |
</html>
|
|
|
|
| 314 |
<span x-text="wordStats.original || '-'"></span> β <span x-text="wordStats.improved || '-'"></span>
|
| 315 |
</span>
|
| 316 |
</div>
|
| 317 |
+
<div class="mt-2 flex items-center justify-center space-x-2">
|
| 318 |
+
<span x-show="hasMarkdown" class="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium bg-purple-100 dark:bg-purple-900 text-purple-800 dark:text-purple-200">
|
| 319 |
<svg class="w-3 h-3 mr-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
| 320 |
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"></path>
|
| 321 |
</svg>
|
| 322 |
Markdown Detected
|
| 323 |
</span>
|
| 324 |
+
<span x-show="hasReasoningTrace" class="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-medium bg-blue-100 dark:bg-blue-900 text-blue-800 dark:text-blue-200">
|
| 325 |
+
<svg class="w-3 h-3 mr-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
| 326 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z"></path>
|
| 327 |
+
</svg>
|
| 328 |
+
Reasoning Trace
|
| 329 |
+
</span>
|
| 330 |
</div>
|
| 331 |
</div>
|
| 332 |
</div>
|
|
|
|
| 396 |
|
| 397 |
<!-- Improved Only -->
|
| 398 |
<div x-show="activeTab === 'improved'" class="max-w-none">
|
| 399 |
+
<!-- Reasoning Trace Panel -->
|
| 400 |
+
<div x-show="hasReasoningTrace" class="mb-4">
|
| 401 |
+
<div class="bg-blue-50 dark:bg-blue-950/20 border border-blue-200 dark:border-blue-800 rounded-lg">
|
| 402 |
+
<button
|
| 403 |
+
@click="showReasoning = !showReasoning"
|
| 404 |
+
class="w-full px-4 py-3 flex items-center justify-between text-left hover:bg-blue-100 dark:hover:bg-blue-950/40 transition-colors rounded-t-lg"
|
| 405 |
+
>
|
| 406 |
+
<div class="flex items-center space-x-2">
|
| 407 |
+
<svg class="w-5 h-5 text-blue-600 dark:text-blue-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
| 408 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z"></path>
|
| 409 |
+
</svg>
|
| 410 |
+
<span class="font-medium text-gray-900 dark:text-gray-100">Model Reasoning</span>
|
| 411 |
+
<span class="text-sm text-gray-600 dark:text-gray-400" x-show="reasoningStats">
|
| 412 |
+
(<span x-text="reasoningStats?.reasoningWords"></span> words, <span x-text="reasoningStats?.reasoningRatio"></span>% of output)
|
| 413 |
+
</span>
|
| 414 |
+
</div>
|
| 415 |
+
<svg
|
| 416 |
+
class="w-5 h-5 text-gray-500 dark:text-gray-400 transition-transform"
|
| 417 |
+
:class="showReasoning ? 'rotate-180' : ''"
|
| 418 |
+
fill="none" stroke="currentColor" viewBox="0 0 24 24"
|
| 419 |
+
>
|
| 420 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M19 9l-7 7-7-7"></path>
|
| 421 |
+
</svg>
|
| 422 |
+
</button>
|
| 423 |
+
|
| 424 |
+
<div x-show="showReasoning" x-collapse class="px-4 pb-4">
|
| 425 |
+
<div class="bg-white dark:bg-gray-800 rounded-lg p-4 mt-2">
|
| 426 |
+
<template x-if="formattedReasoning && formattedReasoning.steps.length > 0">
|
| 427 |
+
<div class="space-y-3">
|
| 428 |
+
<template x-for="(step, index) in formattedReasoning.steps" :key="index">
|
| 429 |
+
<div class="pl-4 border-l-2 border-gray-200 dark:border-gray-700">
|
| 430 |
+
<div class="font-medium text-sm text-gray-900 dark:text-gray-100 mb-1">
|
| 431 |
+
<span class="inline-block w-6 h-6 bg-blue-100 dark:bg-blue-900 text-blue-600 dark:text-blue-400 rounded-full text-center text-xs leading-6 mr-2" x-text="step.number || (index + 1)"></span>
|
| 432 |
+
<span x-text="step.title"></span>
|
| 433 |
+
</div>
|
| 434 |
+
<div class="text-sm text-gray-700 dark:text-gray-300 whitespace-pre-wrap" x-text="step.content"></div>
|
| 435 |
+
</div>
|
| 436 |
+
</template>
|
| 437 |
+
</div>
|
| 438 |
+
</template>
|
| 439 |
+
|
| 440 |
+
<template x-if="!formattedReasoning || formattedReasoning.steps.length === 0">
|
| 441 |
+
<pre class="whitespace-pre-wrap font-mono text-xs text-gray-700 dark:text-gray-300" x-text="reasoningContent"></pre>
|
| 442 |
+
</template>
|
| 443 |
+
</div>
|
| 444 |
+
</div>
|
| 445 |
+
</div>
|
| 446 |
+
</div>
|
| 447 |
+
|
| 448 |
+
<!-- Final Answer Content -->
|
| 449 |
+
<div x-show="hasReasoningTrace" class="mb-2">
|
| 450 |
+
<div class="flex items-center space-x-2 text-sm text-gray-600 dark:text-gray-400 mb-2">
|
| 451 |
+
<svg class="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
| 452 |
+
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z"></path>
|
| 453 |
+
</svg>
|
| 454 |
+
<span>Final Output</span>
|
| 455 |
+
</div>
|
| 456 |
+
</div>
|
| 457 |
+
|
| 458 |
<div x-show="!renderMarkdown">
|
| 459 |
<pre class="whitespace-pre-wrap font-mono text-xs bg-gray-50 dark:bg-gray-800 text-gray-900 dark:text-gray-100 p-4 rounded-lg" x-text="getImprovedText()"></pre>
|
| 460 |
</div>
|
|
|
|
| 597 |
<!-- Local Scripts -->
|
| 598 |
<script src="js/diff-utils.js"></script>
|
| 599 |
<script src="js/dataset-api.js"></script>
|
| 600 |
+
<script src="js/reasoning-parser.js"></script>
|
| 601 |
<script src="js/app.js"></script>
|
| 602 |
</body>
|
| 603 |
</html>
|
js/app.js
CHANGED
|
@@ -12,7 +12,8 @@ document.addEventListener('alpine:init', () => {
|
|
| 12 |
// Example datasets
|
| 13 |
exampleDatasets: [
|
| 14 |
{ id: 'davanstrien/exams-ocr', name: 'Exams OCR', description: 'Historical exam papers with VLM corrections' },
|
| 15 |
-
{ id: 'davanstrien/rolm-test', name: 'ROLM Test', description: 'Documents processed with RolmOCR model' }
|
|
|
|
| 16 |
],
|
| 17 |
|
| 18 |
// Navigation state
|
|
@@ -33,6 +34,14 @@ document.addEventListener('alpine:init', () => {
|
|
| 33 |
renderMarkdown: false,
|
| 34 |
hasMarkdown: false,
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
// Flow view state
|
| 37 |
flowItems: [],
|
| 38 |
flowStartIndex: 0,
|
|
@@ -190,9 +199,10 @@ document.addEventListener('alpine:init', () => {
|
|
| 190 |
console.log('Column info:', this.columnInfo);
|
| 191 |
console.log('Current sample keys:', Object.keys(this.currentSample));
|
| 192 |
|
| 193 |
-
// Check if improved text contains markdown
|
| 194 |
const improvedText = this.getImprovedText();
|
| 195 |
-
this.
|
|
|
|
| 196 |
|
| 197 |
// Update diff when sample changes
|
| 198 |
this.updateDiff();
|
|
@@ -279,6 +289,38 @@ document.addEventListener('alpine:init', () => {
|
|
| 279 |
};
|
| 280 |
},
|
| 281 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 282 |
getOriginalText() {
|
| 283 |
if (!this.currentSample) return '';
|
| 284 |
const columns = this.api.detectColumns(null, this.currentSample);
|
|
@@ -286,6 +328,17 @@ document.addEventListener('alpine:init', () => {
|
|
| 286 |
},
|
| 287 |
|
| 288 |
getImprovedText() {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 289 |
if (!this.currentSample) return '';
|
| 290 |
const columns = this.api.detectColumns(null, this.currentSample);
|
| 291 |
return this.currentSample[columns.improvedText] || 'No improved text found';
|
|
@@ -564,6 +617,15 @@ document.addEventListener('alpine:init', () => {
|
|
| 564 |
content += `${'='.repeat(50)}\n`;
|
| 565 |
content += original;
|
| 566 |
content += `\n\n${'='.repeat(50)}\n\n`;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 567 |
content += `IMPROVED OCR:\n`;
|
| 568 |
content += `${'='.repeat(50)}\n`;
|
| 569 |
content += improved;
|
|
|
|
| 12 |
// Example datasets
|
| 13 |
exampleDatasets: [
|
| 14 |
{ id: 'davanstrien/exams-ocr', name: 'Exams OCR', description: 'Historical exam papers with VLM corrections' },
|
| 15 |
+
{ id: 'davanstrien/rolm-test', name: 'ROLM Test', description: 'Documents processed with RolmOCR model' },
|
| 16 |
+
{ id: 'davanstrien/india-medical-ocr-test', name: 'India Medical OCR', description: 'Medical documents with NuMarkdown reasoning traces' }
|
| 17 |
],
|
| 18 |
|
| 19 |
// Navigation state
|
|
|
|
| 34 |
renderMarkdown: false,
|
| 35 |
hasMarkdown: false,
|
| 36 |
|
| 37 |
+
// Reasoning trace state
|
| 38 |
+
hasReasoningTrace: false,
|
| 39 |
+
showReasoning: false,
|
| 40 |
+
reasoningContent: null,
|
| 41 |
+
answerContent: null,
|
| 42 |
+
reasoningStats: null,
|
| 43 |
+
formattedReasoning: null,
|
| 44 |
+
|
| 45 |
// Flow view state
|
| 46 |
flowItems: [],
|
| 47 |
flowStartIndex: 0,
|
|
|
|
| 199 |
console.log('Column info:', this.columnInfo);
|
| 200 |
console.log('Current sample keys:', Object.keys(this.currentSample));
|
| 201 |
|
| 202 |
+
// Check if improved text contains markdown and reasoning traces
|
| 203 |
const improvedText = this.getImprovedText();
|
| 204 |
+
this.parseReasoningTrace(improvedText);
|
| 205 |
+
this.hasMarkdown = this.detectMarkdown(this.answerContent || improvedText);
|
| 206 |
|
| 207 |
// Update diff when sample changes
|
| 208 |
this.updateDiff();
|
|
|
|
| 289 |
};
|
| 290 |
},
|
| 291 |
|
| 292 |
+
parseReasoningTrace(text) {
|
| 293 |
+
// Reset reasoning state
|
| 294 |
+
this.hasReasoningTrace = false;
|
| 295 |
+
this.reasoningContent = null;
|
| 296 |
+
this.answerContent = null;
|
| 297 |
+
this.reasoningStats = null;
|
| 298 |
+
this.formattedReasoning = null;
|
| 299 |
+
|
| 300 |
+
if (!text || !window.ReasoningParser) return;
|
| 301 |
+
|
| 302 |
+
// Check if text contains reasoning trace
|
| 303 |
+
if (ReasoningParser.detectReasoningTrace(text)) {
|
| 304 |
+
const parsed = ReasoningParser.parseReasoningContent(text);
|
| 305 |
+
|
| 306 |
+
if (parsed.hasReasoning) {
|
| 307 |
+
this.hasReasoningTrace = true;
|
| 308 |
+
this.reasoningContent = parsed.reasoning;
|
| 309 |
+
this.answerContent = parsed.answer;
|
| 310 |
+
this.formattedReasoning = ReasoningParser.formatReasoningSteps(parsed.reasoning);
|
| 311 |
+
this.reasoningStats = ReasoningParser.getReasoningStats(parsed);
|
| 312 |
+
|
| 313 |
+
console.log('Reasoning trace detected:', this.reasoningStats);
|
| 314 |
+
} else {
|
| 315 |
+
// No reasoning found, use original text as answer
|
| 316 |
+
this.answerContent = text;
|
| 317 |
+
}
|
| 318 |
+
} else {
|
| 319 |
+
// No reasoning markers, use original text
|
| 320 |
+
this.answerContent = text;
|
| 321 |
+
}
|
| 322 |
+
},
|
| 323 |
+
|
| 324 |
getOriginalText() {
|
| 325 |
if (!this.currentSample) return '';
|
| 326 |
const columns = this.api.detectColumns(null, this.currentSample);
|
|
|
|
| 328 |
},
|
| 329 |
|
| 330 |
getImprovedText() {
|
| 331 |
+
if (!this.currentSample) return '';
|
| 332 |
+
const columns = this.api.detectColumns(null, this.currentSample);
|
| 333 |
+
const rawText = this.currentSample[columns.improvedText] || 'No improved text found';
|
| 334 |
+
|
| 335 |
+
// If we have parsed answer content from reasoning trace, use that
|
| 336 |
+
// Otherwise return the raw text
|
| 337 |
+
return this.hasReasoningTrace && this.answerContent ? this.answerContent : rawText;
|
| 338 |
+
},
|
| 339 |
+
|
| 340 |
+
getRawImprovedText() {
|
| 341 |
+
// Get the raw improved text without parsing reasoning traces
|
| 342 |
if (!this.currentSample) return '';
|
| 343 |
const columns = this.api.detectColumns(null, this.currentSample);
|
| 344 |
return this.currentSample[columns.improvedText] || 'No improved text found';
|
|
|
|
| 617 |
content += `${'='.repeat(50)}\n`;
|
| 618 |
content += original;
|
| 619 |
content += `\n\n${'='.repeat(50)}\n\n`;
|
| 620 |
+
|
| 621 |
+
// Include reasoning trace if available
|
| 622 |
+
if (this.hasReasoningTrace && this.reasoningContent) {
|
| 623 |
+
content += `MODEL REASONING:\n`;
|
| 624 |
+
content += `${'='.repeat(50)}\n`;
|
| 625 |
+
content += this.reasoningContent;
|
| 626 |
+
content += `\n\n${'='.repeat(50)}\n\n`;
|
| 627 |
+
}
|
| 628 |
+
|
| 629 |
content += `IMPROVED OCR:\n`;
|
| 630 |
content += `${'='.repeat(50)}\n`;
|
| 631 |
content += improved;
|
js/reasoning-parser.js
ADDED
|
@@ -0,0 +1,224 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Reasoning Trace Parser
|
| 3 |
+
* Handles parsing and formatting of model reasoning traces from OCR outputs
|
| 4 |
+
*/
|
| 5 |
+
|
| 6 |
+
class ReasoningParser {
|
| 7 |
+
/**
|
| 8 |
+
* Detect if text contains reasoning trace markers
|
| 9 |
+
* @param {string} text - The text to check
|
| 10 |
+
* @returns {boolean} - True if reasoning trace is detected
|
| 11 |
+
*/
|
| 12 |
+
static detectReasoningTrace(text) {
|
| 13 |
+
if (!text || typeof text !== 'string') return false;
|
| 14 |
+
|
| 15 |
+
// Check for common reasoning trace patterns
|
| 16 |
+
const patterns = [
|
| 17 |
+
/<think>/i,
|
| 18 |
+
/<thinking>/i,
|
| 19 |
+
/<reasoning>/i,
|
| 20 |
+
/<thought>/i
|
| 21 |
+
];
|
| 22 |
+
|
| 23 |
+
return patterns.some(pattern => pattern.test(text));
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
/**
|
| 27 |
+
* Parse reasoning content from text
|
| 28 |
+
* @param {string} text - The text containing reasoning trace
|
| 29 |
+
* @returns {object} - Object with reasoning and answer sections
|
| 30 |
+
*/
|
| 31 |
+
static parseReasoningContent(text) {
|
| 32 |
+
if (!text) {
|
| 33 |
+
return { reasoning: null, answer: null, original: text };
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
// Try multiple patterns for flexibility
|
| 37 |
+
const patterns = [
|
| 38 |
+
{
|
| 39 |
+
start: /<think>/i,
|
| 40 |
+
end: /<\/think>/i,
|
| 41 |
+
answerStart: /<answer>/i,
|
| 42 |
+
answerEnd: /<\/answer>/i
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
start: /<thinking>/i,
|
| 46 |
+
end: /<\/thinking>/i,
|
| 47 |
+
answerStart: /<answer>/i,
|
| 48 |
+
answerEnd: /<\/answer>/i
|
| 49 |
+
},
|
| 50 |
+
{
|
| 51 |
+
start: /<reasoning>/i,
|
| 52 |
+
end: /<\/reasoning>/i,
|
| 53 |
+
answerStart: /<output>/i,
|
| 54 |
+
answerEnd: /<\/output>/i
|
| 55 |
+
}
|
| 56 |
+
];
|
| 57 |
+
|
| 58 |
+
for (const pattern of patterns) {
|
| 59 |
+
const reasoningMatch = text.match(new RegExp(
|
| 60 |
+
pattern.start.source + '([\\s\\S]*?)' + pattern.end.source,
|
| 61 |
+
'i'
|
| 62 |
+
));
|
| 63 |
+
|
| 64 |
+
const answerMatch = text.match(new RegExp(
|
| 65 |
+
pattern.answerStart.source + '([\\s\\S]*?)' + pattern.answerEnd.source,
|
| 66 |
+
'i'
|
| 67 |
+
));
|
| 68 |
+
|
| 69 |
+
if (reasoningMatch || answerMatch) {
|
| 70 |
+
return {
|
| 71 |
+
reasoning: reasoningMatch ? reasoningMatch[1].trim() : null,
|
| 72 |
+
answer: answerMatch ? answerMatch[1].trim() : null,
|
| 73 |
+
hasReasoning: !!reasoningMatch,
|
| 74 |
+
hasAnswer: !!answerMatch,
|
| 75 |
+
original: text
|
| 76 |
+
};
|
| 77 |
+
}
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
// If no patterns match, return original text as answer
|
| 81 |
+
return {
|
| 82 |
+
reasoning: null,
|
| 83 |
+
answer: text,
|
| 84 |
+
hasReasoning: false,
|
| 85 |
+
hasAnswer: true,
|
| 86 |
+
original: text
|
| 87 |
+
};
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
/**
|
| 91 |
+
* Format reasoning steps for display
|
| 92 |
+
* @param {string} reasoningText - The raw reasoning text
|
| 93 |
+
* @returns {object} - Formatted reasoning with steps and metadata
|
| 94 |
+
*/
|
| 95 |
+
static formatReasoningSteps(reasoningText) {
|
| 96 |
+
if (!reasoningText) return null;
|
| 97 |
+
|
| 98 |
+
// Parse numbered steps (e.g., "1. Step content")
|
| 99 |
+
const stepPattern = /^\d+\.\s+\*\*(.+?)\*\*(.+?)(?=^\d+\.\s|\z)/gms;
|
| 100 |
+
const steps = [];
|
| 101 |
+
let match;
|
| 102 |
+
|
| 103 |
+
while ((match = stepPattern.exec(reasoningText)) !== null) {
|
| 104 |
+
steps.push({
|
| 105 |
+
title: match[1].trim(),
|
| 106 |
+
content: match[2].trim()
|
| 107 |
+
});
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
// If no numbered steps found, try to parse by line breaks
|
| 111 |
+
if (steps.length === 0) {
|
| 112 |
+
const lines = reasoningText.split('\n').filter(line => line.trim());
|
| 113 |
+
lines.forEach((line, index) => {
|
| 114 |
+
// Check if line starts with a number
|
| 115 |
+
const numberedMatch = line.match(/^(\d+)\.\s*(.+)/);
|
| 116 |
+
if (numberedMatch) {
|
| 117 |
+
const title = numberedMatch[2].replace(/\*\*/g, '').trim();
|
| 118 |
+
steps.push({
|
| 119 |
+
number: numberedMatch[1],
|
| 120 |
+
title: title,
|
| 121 |
+
content: ''
|
| 122 |
+
});
|
| 123 |
+
} else if (steps.length > 0) {
|
| 124 |
+
// Add to previous step's content
|
| 125 |
+
steps[steps.length - 1].content += '\n' + line;
|
| 126 |
+
}
|
| 127 |
+
});
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
return {
|
| 131 |
+
steps: steps,
|
| 132 |
+
rawText: reasoningText,
|
| 133 |
+
stepCount: steps.length,
|
| 134 |
+
characterCount: reasoningText.length,
|
| 135 |
+
wordCount: reasoningText.split(/\s+/).filter(w => w).length
|
| 136 |
+
};
|
| 137 |
+
}
|
| 138 |
+
|
| 139 |
+
/**
|
| 140 |
+
* Extract key insights from reasoning
|
| 141 |
+
* @param {string} reasoningText - The reasoning text
|
| 142 |
+
* @returns {array} - Array of key insights or decisions
|
| 143 |
+
*/
|
| 144 |
+
static extractInsights(reasoningText) {
|
| 145 |
+
if (!reasoningText) return [];
|
| 146 |
+
|
| 147 |
+
const insights = [];
|
| 148 |
+
|
| 149 |
+
// Look for decision points and key observations
|
| 150 |
+
const patterns = [
|
| 151 |
+
/decision:\s*(.+)/gi,
|
| 152 |
+
/observation:\s*(.+)/gi,
|
| 153 |
+
/note:\s*(.+)/gi,
|
| 154 |
+
/important:\s*(.+)/gi,
|
| 155 |
+
/key finding:\s*(.+)/gi
|
| 156 |
+
];
|
| 157 |
+
|
| 158 |
+
patterns.forEach(pattern => {
|
| 159 |
+
let match;
|
| 160 |
+
while ((match = pattern.exec(reasoningText)) !== null) {
|
| 161 |
+
insights.push(match[1].trim());
|
| 162 |
+
}
|
| 163 |
+
});
|
| 164 |
+
|
| 165 |
+
return insights;
|
| 166 |
+
}
|
| 167 |
+
|
| 168 |
+
/**
|
| 169 |
+
* Get summary statistics about the reasoning trace
|
| 170 |
+
* @param {object} parsedContent - Parsed reasoning content
|
| 171 |
+
* @returns {object} - Statistics about the reasoning
|
| 172 |
+
*/
|
| 173 |
+
static getReasoningStats(parsedContent) {
|
| 174 |
+
if (!parsedContent || !parsedContent.reasoning) {
|
| 175 |
+
return {
|
| 176 |
+
hasReasoning: false,
|
| 177 |
+
reasoningLength: 0,
|
| 178 |
+
answerLength: 0,
|
| 179 |
+
reasoningRatio: 0
|
| 180 |
+
};
|
| 181 |
+
}
|
| 182 |
+
|
| 183 |
+
const reasoningLength = parsedContent.reasoning.length;
|
| 184 |
+
const answerLength = parsedContent.answer ? parsedContent.answer.length : 0;
|
| 185 |
+
const totalLength = reasoningLength + answerLength;
|
| 186 |
+
|
| 187 |
+
return {
|
| 188 |
+
hasReasoning: true,
|
| 189 |
+
reasoningLength: reasoningLength,
|
| 190 |
+
answerLength: answerLength,
|
| 191 |
+
totalLength: totalLength,
|
| 192 |
+
reasoningRatio: totalLength > 0 ? (reasoningLength / totalLength * 100).toFixed(1) : 0,
|
| 193 |
+
reasoningWords: parsedContent.reasoning.split(/\s+/).filter(w => w).length,
|
| 194 |
+
answerWords: parsedContent.answer ? parsedContent.answer.split(/\s+/).filter(w => w).length : 0
|
| 195 |
+
};
|
| 196 |
+
}
|
| 197 |
+
|
| 198 |
+
/**
|
| 199 |
+
* Format reasoning for export
|
| 200 |
+
* @param {object} parsedContent - Parsed reasoning content
|
| 201 |
+
* @param {boolean} includeReasoning - Whether to include reasoning in export
|
| 202 |
+
* @returns {string} - Formatted text for export
|
| 203 |
+
*/
|
| 204 |
+
static formatForExport(parsedContent, includeReasoning = true) {
|
| 205 |
+
if (!parsedContent) return '';
|
| 206 |
+
|
| 207 |
+
let exportText = '';
|
| 208 |
+
|
| 209 |
+
if (includeReasoning && parsedContent.reasoning) {
|
| 210 |
+
exportText += '=== MODEL REASONING ===\n\n';
|
| 211 |
+
exportText += parsedContent.reasoning;
|
| 212 |
+
exportText += '\n\n=== FINAL OUTPUT ===\n\n';
|
| 213 |
+
}
|
| 214 |
+
|
| 215 |
+
if (parsedContent.answer) {
|
| 216 |
+
exportText += parsedContent.answer;
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
+
return exportText;
|
| 220 |
+
}
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
// Export for use in other scripts
|
| 224 |
+
window.ReasoningParser = ReasoningParser;
|
linkedin-post.txt
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
How well do VLM-based OCR models handle Victorian theatre playbills? π
|
| 2 |
+
|
| 3 |
+
Last week I shared OCR Time Capsule for comparing traditional vs VLM-based OCR. I've now added some examples from challenging collections: The British Library's Theatrical playbills from Britain and Ireland collection.
|
| 4 |
+
|
| 5 |
+
These 150-year-old documents are brutal for OCR:
|
| 6 |
+
- Decorative fonts in every size imaginable
|
| 7 |
+
- Multi-column layouts with text at odd angles
|
| 8 |
+
- Faded ink and show-through from the reverse
|
| 9 |
+
- ALL CAPS DRAMATIC ANNOUNCEMENTS!!!
|
| 10 |
+
|
| 11 |
+
For this dataset I used the RolmOCR model from Reducto (processed via HF Jobs - love how easy UV scripts make GPU inference!). The results? The improvements over traditional OCR are even more dramatic than with exam papers.
|
| 12 |
+
|
| 13 |
+
π Explore the app: https://huggingface.co/spaces/davanstrien/ocr-time-capsule
|
| 14 |
+
π BL Theatre dataset: https://bl.iro.bl.uk/concern/datasets/a8534aff-c8e3-4fc8-adc1-da542080b1e3
|
| 15 |
+
|
| 16 |
+
I'll continue to work through the suggestions I got last week but feel free to suggest other hairy OCR challenges to compare VLMs vs existing OCR!
|
| 17 |
+
|
| 18 |
+
#DigitalHumanities #OCR #GLAM #BritishLibrary #TheatreHistory
|
mobile-enhancement-plan.md
ADDED
|
@@ -0,0 +1,237 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Mobile Enhancement Plan for OCR Time Capsule
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
This document outlines the technical requirements for implementing comprehensive mobile support in OCR Time Capsule. While the application claims mobile support, the current implementation has significant limitations that prevent a good mobile user experience.
|
| 6 |
+
|
| 7 |
+
**Estimated Effort:** 800-1,200 lines of code changes
|
| 8 |
+
**Complexity:** Medium-High
|
| 9 |
+
**Development Time:** 3-5 days for full implementation, 2 days for MVP
|
| 10 |
+
|
| 11 |
+
## Current Mobile Limitations
|
| 12 |
+
|
| 13 |
+
1. **Fixed desktop layout** - Rigid 1/3 + 2/3 split doesn't adapt to small screens
|
| 14 |
+
2. **No touch support** - Navigation relies entirely on keyboard shortcuts
|
| 15 |
+
3. **Fixed positioning issues** - Footer overlaps content on mobile browsers
|
| 16 |
+
4. **Small touch targets** - Buttons/inputs too small for finger interaction
|
| 17 |
+
5. **Desktop-only interactions** - Hover states, dropdown menus not touch-friendly
|
| 18 |
+
6. **Overflow problems** - Content gets cut off due to fixed heights
|
| 19 |
+
|
| 20 |
+
## Required Changes
|
| 21 |
+
|
| 22 |
+
### 1. Layout Restructuring (Critical)
|
| 23 |
+
|
| 24 |
+
**Current:** Fixed side-by-side layout
|
| 25 |
+
```html
|
| 26 |
+
<!-- Current structure -->
|
| 27 |
+
<div class="flex-1 flex h-full">
|
| 28 |
+
<div class="w-1/3">...</div> <!-- Image panel -->
|
| 29 |
+
<div class="flex-1">...</div> <!-- Text panel -->
|
| 30 |
+
</div>
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
**Required:** Responsive stacked layout
|
| 34 |
+
```html
|
| 35 |
+
<!-- Mobile-first approach -->
|
| 36 |
+
<div class="flex flex-col md:flex-row h-full">
|
| 37 |
+
<div class="w-full md:w-1/3">...</div>
|
| 38 |
+
<div class="w-full md:flex-1">...</div>
|
| 39 |
+
</div>
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
**Changes needed:**
|
| 43 |
+
- Update all layout containers in `index.html` (~50 lines)
|
| 44 |
+
- Add mobile-specific CSS classes (~100 lines)
|
| 45 |
+
- Implement collapsible image panel for mobile
|
| 46 |
+
|
| 47 |
+
### 2. Touch Navigation Implementation
|
| 48 |
+
|
| 49 |
+
**New JavaScript required in `app.js`:**
|
| 50 |
+
```javascript
|
| 51 |
+
// Touch gesture handling
|
| 52 |
+
let touchStartX = 0;
|
| 53 |
+
let touchEndX = 0;
|
| 54 |
+
|
| 55 |
+
initTouchNavigation() {
|
| 56 |
+
const container = document.getElementById('main-content');
|
| 57 |
+
|
| 58 |
+
container.addEventListener('touchstart', (e) => {
|
| 59 |
+
touchStartX = e.changedTouches[0].screenX;
|
| 60 |
+
});
|
| 61 |
+
|
| 62 |
+
container.addEventListener('touchend', (e) => {
|
| 63 |
+
touchEndX = e.changedTouches[0].screenX;
|
| 64 |
+
this.handleSwipe();
|
| 65 |
+
});
|
| 66 |
+
}
|
| 67 |
+
|
| 68 |
+
handleSwipe() {
|
| 69 |
+
const swipeThreshold = 50;
|
| 70 |
+
const diff = touchStartX - touchEndX;
|
| 71 |
+
|
| 72 |
+
if (Math.abs(diff) > swipeThreshold) {
|
| 73 |
+
if (diff > 0) {
|
| 74 |
+
this.nextSample(); // Swipe left
|
| 75 |
+
} else {
|
| 76 |
+
this.previousSample(); // Swipe right
|
| 77 |
+
}
|
| 78 |
+
}
|
| 79 |
+
}
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
**Scope:** ~150 lines for complete touch support including:
|
| 83 |
+
- Swipe detection
|
| 84 |
+
- Touch feedback
|
| 85 |
+
- Gesture velocity calculation
|
| 86 |
+
- Preventing accidental triggers
|
| 87 |
+
|
| 88 |
+
### 3. Mobile Navigation UI
|
| 89 |
+
|
| 90 |
+
**Replace fixed footer with mobile-friendly navigation:**
|
| 91 |
+
```html
|
| 92 |
+
<!-- Mobile navigation bar -->
|
| 93 |
+
<nav class="md:hidden fixed bottom-0 left-0 right-0 bg-white dark:bg-gray-800 border-t">
|
| 94 |
+
<div class="grid grid-cols-3 h-16">
|
| 95 |
+
<button class="flex items-center justify-center" @click="previousSample()">
|
| 96 |
+
<svg class="w-8 h-8">...</svg>
|
| 97 |
+
</button>
|
| 98 |
+
<button class="flex items-center justify-center" @click="showPageSelector = true">
|
| 99 |
+
<span class="text-lg font-medium" x-text="`${currentIndex + 1}/${totalSamples}`"></span>
|
| 100 |
+
</button>
|
| 101 |
+
<button class="flex items-center justify-center" @click="nextSample()">
|
| 102 |
+
<svg class="w-8 h-8">...</svg>
|
| 103 |
+
</button>
|
| 104 |
+
</div>
|
| 105 |
+
</nav>
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
**Changes:** ~100 lines for navigation components
|
| 109 |
+
|
| 110 |
+
### 4. Touch-Friendly Components
|
| 111 |
+
|
| 112 |
+
**Update all interactive elements:**
|
| 113 |
+
- Minimum touch target size: 44x44px
|
| 114 |
+
- Add `touch-action` CSS properties
|
| 115 |
+
- Increase padding on all buttons
|
| 116 |
+
- Replace hover menus with tap-to-open modals
|
| 117 |
+
|
| 118 |
+
**Example button update:**
|
| 119 |
+
```html
|
| 120 |
+
<!-- Before -->
|
| 121 |
+
<button class="px-2 py-1 text-sm">Load</button>
|
| 122 |
+
|
| 123 |
+
<!-- After -->
|
| 124 |
+
<button class="px-4 py-3 md:px-2 md:py-1 text-base md:text-sm min-w-[44px] min-h-[44px] md:min-w-0 md:min-h-0">
|
| 125 |
+
Load
|
| 126 |
+
</button>
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
### 5. Mobile Dock/Gallery
|
| 130 |
+
|
| 131 |
+
**Transform desktop dock to mobile carousel:**
|
| 132 |
+
```javascript
|
| 133 |
+
// Mobile-optimized thumbnail gallery
|
| 134 |
+
initMobileGallery() {
|
| 135 |
+
this.mobileGallery = {
|
| 136 |
+
currentIndex: 0,
|
| 137 |
+
itemsPerView: 3,
|
| 138 |
+
thumbnails: []
|
| 139 |
+
};
|
| 140 |
+
|
| 141 |
+
// Horizontal scroll with snap points
|
| 142 |
+
const gallery = document.getElementById('mobile-gallery');
|
| 143 |
+
gallery.style.scrollSnapType = 'x mandatory';
|
| 144 |
+
gallery.style.overflowX = 'auto';
|
| 145 |
+
gallery.style.webkitOverflowScrolling = 'touch';
|
| 146 |
+
}
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
**Scope:** ~200 lines for mobile gallery implementation
|
| 150 |
+
|
| 151 |
+
### 6. Responsive Breakpoints
|
| 152 |
+
|
| 153 |
+
**Implement proper breakpoint system:**
|
| 154 |
+
```css
|
| 155 |
+
/* Mobile first approach */
|
| 156 |
+
/* Base: Mobile (< 640px) */
|
| 157 |
+
.container {
|
| 158 |
+
display: block;
|
| 159 |
+
padding: 1rem;
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
/* Tablet (640px - 1024px) */
|
| 163 |
+
@media (min-width: 640px) {
|
| 164 |
+
.container {
|
| 165 |
+
display: flex;
|
| 166 |
+
padding: 1.5rem;
|
| 167 |
+
}
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
/* Desktop (> 1024px) */
|
| 171 |
+
@media (min-width: 1024px) {
|
| 172 |
+
.container {
|
| 173 |
+
padding: 2rem;
|
| 174 |
+
}
|
| 175 |
+
}
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
### 7. Performance Optimizations
|
| 179 |
+
|
| 180 |
+
**Mobile-specific optimizations:**
|
| 181 |
+
- Lazy load images with Intersection Observer
|
| 182 |
+
- Reduce initial JavaScript bundle
|
| 183 |
+
- Implement virtual scrolling for large datasets
|
| 184 |
+
- Add `will-change` CSS for smooth animations
|
| 185 |
+
|
| 186 |
+
## Implementation Approach
|
| 187 |
+
|
| 188 |
+
### Phase 1: MVP (2 days)
|
| 189 |
+
1. Basic responsive layout
|
| 190 |
+
2. Touch navigation (swipe gestures)
|
| 191 |
+
3. Mobile-friendly buttons
|
| 192 |
+
4. Fix overflow issues
|
| 193 |
+
|
| 194 |
+
### Phase 2: Enhanced Mobile UX (2 days)
|
| 195 |
+
1. Mobile navigation bar
|
| 196 |
+
2. Touch-optimized dock
|
| 197 |
+
3. Page selector modal
|
| 198 |
+
4. Gesture refinements
|
| 199 |
+
|
| 200 |
+
### Phase 3: Polish (1 day)
|
| 201 |
+
1. Performance optimizations
|
| 202 |
+
2. PWA features
|
| 203 |
+
3. Cross-device testing
|
| 204 |
+
4. Documentation
|
| 205 |
+
|
| 206 |
+
## Testing Requirements
|
| 207 |
+
|
| 208 |
+
### Devices to Test
|
| 209 |
+
- **iOS:** iPhone SE, iPhone 12/13, iPad
|
| 210 |
+
- **Android:** Various screen sizes (5", 6", 7")
|
| 211 |
+
- **Browsers:** Safari iOS, Chrome Android, Firefox Mobile
|
| 212 |
+
|
| 213 |
+
### Key Test Scenarios
|
| 214 |
+
1. Portrait/landscape orientation changes
|
| 215 |
+
2. Touch gesture accuracy
|
| 216 |
+
3. Text readability at different zoom levels
|
| 217 |
+
4. Navigation button accessibility
|
| 218 |
+
5. Image loading performance on slow connections
|
| 219 |
+
|
| 220 |
+
## Code Impact Summary
|
| 221 |
+
|
| 222 |
+
| Component | Lines Changed | Complexity |
|
| 223 |
+
|-----------|--------------|------------|
|
| 224 |
+
| HTML Layout | 150-200 | Medium |
|
| 225 |
+
| CSS/Tailwind | 200-300 | Low-Medium |
|
| 226 |
+
| Touch Events | 150 | High |
|
| 227 |
+
| Mobile Navigation | 100 | Medium |
|
| 228 |
+
| Gallery/Dock | 200 | High |
|
| 229 |
+
| **Total** | **800-1,200** | **Medium-High** |
|
| 230 |
+
|
| 231 |
+
## Priority Recommendations
|
| 232 |
+
|
| 233 |
+
1. **Must Have:** Responsive layout, basic touch navigation
|
| 234 |
+
2. **Should Have:** Mobile navigation bar, touch-friendly buttons
|
| 235 |
+
3. **Nice to Have:** Gesture refinements, PWA features, animations
|
| 236 |
+
|
| 237 |
+
The most critical change is the layout restructuring - without this, other mobile features won't work properly. Start there and build up progressively.
|
multi-ocr-comparison-ui-patterns.md
ADDED
|
@@ -0,0 +1,277 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Multi-OCR Engine Comparison UI Patterns
|
| 2 |
+
|
| 3 |
+
## Executive Summary
|
| 4 |
+
|
| 5 |
+
This document outlines UI design patterns for comparing the results of 5+ OCR engines in the OCR Time Capsule application. Based on research of existing comparison tools and UI best practices, we recommend a hybrid approach combining selective comparison, matrix views, and progressive disclosure.
|
| 6 |
+
|
| 7 |
+
## Key Design Constraints
|
| 8 |
+
|
| 9 |
+
1. **Human Cognitive Limits**: Users can effectively compare 3-7 items simultaneously
|
| 10 |
+
2. **Screen Real Estate**: Limited horizontal space for side-by-side comparisons
|
| 11 |
+
3. **Information Density**: Need to show both text content and metadata
|
| 12 |
+
4. **Performance**: Rendering 5+ full texts simultaneously can impact performance
|
| 13 |
+
|
| 14 |
+
## Recommended UI Patterns
|
| 15 |
+
|
| 16 |
+
### 1. Selective Comparison Mode (Primary Recommendation)
|
| 17 |
+
|
| 18 |
+
Allow users to select 2-4 engines for detailed comparison from a larger set.
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
+
β Select OCR Engines to Compare: β
|
| 23 |
+
β βββ Tesseract 5.0 βββ Google Vision βββ AWS Textract β
|
| 24 |
+
β βββ€ Azure AI βββ€ PaddleOCR βββ€ Surya OCR β
|
| 25 |
+
β βββ EasyOCR βββ TrOCR βββ RolmOCR β
|
| 26 |
+
β β
|
| 27 |
+
β [Compare Selected (3)] β
|
| 28 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 29 |
+
|
| 30 |
+
After selection:
|
| 31 |
+
βββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ
|
| 32 |
+
β Image β Tesseract β Google β AWS β
|
| 33 |
+
β Preview β 5.0 β Vision β Textract β
|
| 34 |
+
βββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββ€
|
| 35 |
+
β β Text output β Text output β Text output β
|
| 36 |
+
β [IMG] β Lorem ipsum β Lorem ipsum β Lorem ipsum β
|
| 37 |
+
β β dolor sit β dolor sit β dolar sit β
|
| 38 |
+
β β amet... β amet... β amet... β
|
| 39 |
+
βββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
**Advantages:**
|
| 43 |
+
- Maintains readable comparison
|
| 44 |
+
- User controls complexity
|
| 45 |
+
- Scalable to any number of engines
|
| 46 |
+
|
| 47 |
+
### 2. Matrix/Grid Overview
|
| 48 |
+
|
| 49 |
+
Show all results in a compact grid with expand/collapse functionality.
|
| 50 |
+
|
| 51 |
+
```
|
| 52 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 53 |
+
β OCR Engine Comparison Matrix β
|
| 54 |
+
ββββββββββββββ¬ββββββββββββ¬βββββββββββ¬ββββββββββ¬βββββββββ€
|
| 55 |
+
β Engine β Accuracy β Time(ms) β Preview β Action β
|
| 56 |
+
ββββββββββββββΌββββββββββββΌβββββββββββΌββββββββββΌβββββββββ€
|
| 57 |
+
β Tesseract β 94.2% β 1250 β Lorem...β [View] β
|
| 58 |
+
β Google β 98.1% β 320 β Lorem...β [View] β
|
| 59 |
+
β AWS β 97.5% β 410 β Lorem...β [View] β
|
| 60 |
+
β Azure β 96.8% β 380 β Lorem...β [View] β
|
| 61 |
+
β PaddleOCR β 95.3% β 890 β Lorem...β [View] β
|
| 62 |
+
β Surya β 93.7% β 1100 β Lorem...β [View] β
|
| 63 |
+
ββββββββββββββ΄ββββββββββββ΄βββββββββββ΄ββββββββββ΄βββββββββ
|
| 64 |
+
|
| 65 |
+
Click [View] to see full text in modal/sidebar
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
**Advantages:**
|
| 69 |
+
- Shows all engines at once
|
| 70 |
+
- Easy to scan metrics
|
| 71 |
+
- Detailed view on demand
|
| 72 |
+
|
| 73 |
+
### 3. Reference + Diff View
|
| 74 |
+
|
| 75 |
+
Select one OCR result as reference and show diffs from others.
|
| 76 |
+
|
| 77 |
+
```
|
| 78 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 79 |
+
β Reference: Google Vision OCR β
|
| 80 |
+
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 81 |
+
β β Lorem ipsum dolor sit amet, consectetur adipiscing ββ
|
| 82 |
+
β β elit, sed do eiusmod tempor incididunt ut labore ββ
|
| 83 |
+
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 84 |
+
β β
|
| 85 |
+
β Differences from Reference: β
|
| 86 |
+
β βββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
|
| 87 |
+
β β Tesseract β -dolor +dolar (char 12) ββ
|
| 88 |
+
β β β -adipiscing +adipiscing (char 38) ββ
|
| 89 |
+
β βββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ€β
|
| 90 |
+
β β AWS β -consectetur +consektetur (char 27) ββ
|
| 91 |
+
β βββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ€β
|
| 92 |
+
β β Azure β No differences ββ
|
| 93 |
+
β βββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ
|
| 94 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
**Advantages:**
|
| 98 |
+
- Reduces visual complexity
|
| 99 |
+
- Easy to see variations
|
| 100 |
+
- Good for finding consensus
|
| 101 |
+
|
| 102 |
+
### 4. Accordion/Tab Hybrid
|
| 103 |
+
|
| 104 |
+
Combine tabs for primary views with accordions for details.
|
| 105 |
+
|
| 106 |
+
```
|
| 107 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 108 |
+
β [Overview] [Side-by-Side] [Consensus] [Analytics] β
|
| 109 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 110 |
+
β Overview Tab: β
|
| 111 |
+
β β
|
| 112 |
+
β βΌ Tesseract 5.0 (94.2% accuracy) β
|
| 113 |
+
β Lorem ipsum dolor sit amet... β
|
| 114 |
+
β [Show full text] [Compare with others] β
|
| 115 |
+
β β
|
| 116 |
+
β βΆ Google Vision (98.1% accuracy) β
|
| 117 |
+
β βΆ AWS Textract (97.5% accuracy) β
|
| 118 |
+
β βΆ Azure AI (96.8% accuracy) β
|
| 119 |
+
β βΆ PaddleOCR (95.3% accuracy) β
|
| 120 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
**Advantages:**
|
| 124 |
+
- Progressive disclosure
|
| 125 |
+
- Maintains context
|
| 126 |
+
- Flexible navigation
|
| 127 |
+
|
| 128 |
+
### 5. Consensus/Voting View
|
| 129 |
+
|
| 130 |
+
Show agreement levels between engines.
|
| 131 |
+
|
| 132 |
+
```
|
| 133 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 134 |
+
β Consensus View - 6 OCR Engines β
|
| 135 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 136 |
+
β Lorem ipsum βββββ sit amet, ββββββββββββ adipiscing β
|
| 137 |
+
β ^^^^^ ^^^^^^^^^^^^ β
|
| 138 |
+
β 5/6 agree 6/6 agree (consensus) β
|
| 139 |
+
β β
|
| 140 |
+
β Disagreements: β
|
| 141 |
+
β Position 12-16: "dolor" β
|
| 142 |
+
β - Tesseract: "dolar" (1 vote) β
|
| 143 |
+
β - Others: "dolor" (5 votes) β β
|
| 144 |
+
β β
|
| 145 |
+
β Position 27-38: "consectetur" β
|
| 146 |
+
β - AWS: "consektetur" (1 vote) β
|
| 147 |
+
β - Others: "consectetur" (5 votes) β β
|
| 148 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
**Advantages:**
|
| 152 |
+
- Shows confidence levels
|
| 153 |
+
- Identifies problem areas
|
| 154 |
+
- Good for quality assessment
|
| 155 |
+
|
| 156 |
+
### 6. Layered Comparison
|
| 157 |
+
|
| 158 |
+
Stack results with transparency/overlay controls.
|
| 159 |
+
|
| 160 |
+
```
|
| 161 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 162 |
+
β Layer Controls: β Opacity Visible β
|
| 163 |
+
β βββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββ€β
|
| 164 |
+
β β ββ βββββββββ β β ββ
|
| 165 |
+
β β [Overlaid Text View] ββ Tesseract β ββ
|
| 166 |
+
β β ββββββββββββββΌβββββββββ€β
|
| 167 |
+
β β Multiple colored layers ββ βββββββββ β β ββ
|
| 168 |
+
β β showing differences ββ Google β ββ
|
| 169 |
+
β β ββββββββββββββΌβββββββββ€β
|
| 170 |
+
β β ββ βββββββββ β β ββ
|
| 171 |
+
β β ββ AWS β ββ
|
| 172 |
+
β βββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββ
|
| 173 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
**Advantages:**
|
| 177 |
+
- Visual diff representation
|
| 178 |
+
- Adjustable comparison
|
| 179 |
+
- Good for alignment issues
|
| 180 |
+
|
| 181 |
+
## Metadata Display Patterns
|
| 182 |
+
|
| 183 |
+
### Inline Badges
|
| 184 |
+
```
|
| 185 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
| 186 |
+
β Tesseract 5.0 [94.2%] [1.2s] [MIT] β
|
| 187 |
+
β Lorem ipsum dolor sit amet... β
|
| 188 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
| 189 |
+
```
|
| 190 |
+
|
| 191 |
+
### Hover Cards
|
| 192 |
+
```
|
| 193 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
| 194 |
+
β Google Vision β β
|
| 195 |
+
β βββββββββββββββββββββββ β
|
| 196 |
+
β β Accuracy: 98.1% β (on hover) β
|
| 197 |
+
β β Time: 320ms β β
|
| 198 |
+
β β Cost: $0.0015 β β
|
| 199 |
+
β β Language: Multi β β
|
| 200 |
+
β βββββββββββββββββββββββ β
|
| 201 |
+
βββββββββββββββββββββββββββββββββββββββββββ
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
## Navigation Patterns
|
| 205 |
+
|
| 206 |
+
### 1. Engine Selector Bar
|
| 207 |
+
```
|
| 208 |
+
[All] [High Accuracy] [Fast] [Open Source] [Custom Group]
|
| 209 |
+
```
|
| 210 |
+
|
| 211 |
+
### 2. Quick Switch
|
| 212 |
+
```
|
| 213 |
+
Previous Engine [Tesseract βΌ] Next Engine
|
| 214 |
+
Google Vision
|
| 215 |
+
AWS Textract
|
| 216 |
+
Azure AI
|
| 217 |
+
```
|
| 218 |
+
|
| 219 |
+
### 3. Comparison History
|
| 220 |
+
```
|
| 221 |
+
Recent Comparisons:
|
| 222 |
+
β’ Tesseract vs Google vs AWS (2 min ago)
|
| 223 |
+
β’ All engines - Page 15 (5 min ago)
|
| 224 |
+
β’ Azure vs PaddleOCR (10 min ago)
|
| 225 |
+
```
|
| 226 |
+
|
| 227 |
+
## Mobile Considerations
|
| 228 |
+
|
| 229 |
+
For mobile devices, use a stacked card approach:
|
| 230 |
+
|
| 231 |
+
```
|
| 232 |
+
βββββββββββββββββββ
|
| 233 |
+
β Original Image β
|
| 234 |
+
βββββββββββββββββββ€
|
| 235 |
+
β Tesseract 94.2% β
|
| 236 |
+
β βΌ Show text β
|
| 237 |
+
βββββββββββββββββββ€
|
| 238 |
+
β Google 98.1% β
|
| 239 |
+
β βΆ Show text β
|
| 240 |
+
βββββββββββββββββββ€
|
| 241 |
+
β AWS 97.5% β
|
| 242 |
+
β βΆ Show text β
|
| 243 |
+
βββββββββββββββββββ
|
| 244 |
+
```
|
| 245 |
+
|
| 246 |
+
## Performance Optimizations
|
| 247 |
+
|
| 248 |
+
1. **Lazy Loading**: Only load full text when expanded/selected
|
| 249 |
+
2. **Virtual Scrolling**: For long documents
|
| 250 |
+
3. **Caching**: Store OCR results client-side
|
| 251 |
+
4. **Progressive Enhancement**: Start with 2-3 engines, load more on demand
|
| 252 |
+
|
| 253 |
+
## Recommended Implementation Priority
|
| 254 |
+
|
| 255 |
+
1. **Phase 1**: Selective Comparison (2-4 engines)
|
| 256 |
+
2. **Phase 2**: Matrix Overview with metrics
|
| 257 |
+
3. **Phase 3**: Consensus/Voting view
|
| 258 |
+
4. **Phase 4**: Advanced features (layers, history, etc.)
|
| 259 |
+
|
| 260 |
+
## Accessibility Considerations
|
| 261 |
+
|
| 262 |
+
- Keyboard navigation between engines
|
| 263 |
+
- Screen reader announcements for differences
|
| 264 |
+
- High contrast mode for diff highlighting
|
| 265 |
+
- Alternative text descriptions for visual comparisons
|
| 266 |
+
|
| 267 |
+
## Conclusion
|
| 268 |
+
|
| 269 |
+
The selective comparison pattern combined with a matrix overview provides the best balance of usability and functionality for comparing 5+ OCR engines. This approach:
|
| 270 |
+
|
| 271 |
+
- Respects cognitive limits (3-7 items)
|
| 272 |
+
- Provides overview and detail views
|
| 273 |
+
- Scales to any number of engines
|
| 274 |
+
- Maintains performance
|
| 275 |
+
- Works on mobile devices
|
| 276 |
+
|
| 277 |
+
The key is progressive disclosure: show summary information for all engines, but limit detailed comparison to user-selected subsets.
|