Use external img url to replace local img (#25)
Browse files- Use external img url to replace local img (c380804c76a3117855b431bd4376b0cae2176810)
README.md
CHANGED
|
@@ -44,7 +44,7 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
|
|
| 44 |
</div>
|
| 45 |
|
| 46 |
<div align="center">
|
| 47 |
-
<img src="
|
| 48 |
</div>
|
| 49 |
|
| 50 |
## Introduction
|
|
@@ -67,7 +67,7 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
|
|
| 67 |
<!-- PaddleOCR-VL decomposes the complex task of document parsing into a two stages. The first stage, PP-DocLayoutV2, is responsible for layout analysis, where it localizes semantic regions and predicts their reading order. Subsequently, the second stage, PaddleOCR-VL-0.9B, leverages these layout predictions to perform fine-grained recognition of diverse content, including text, tables, formulas, and charts. Finally, a lightweight post-processing module aggregates the outputs from both stages and formats the final document into structured Markdown and JSON. -->
|
| 68 |
|
| 69 |
<div align="center">
|
| 70 |
-
<img src="
|
| 71 |
</div>
|
| 72 |
|
| 73 |
|
|
@@ -150,7 +150,7 @@ for res in output:
|
|
| 150 |
##### PaddleOCR-VL achieves SOTA performance for overall, text, formula, tables and reading order on OmniDocBench v1.5
|
| 151 |
|
| 152 |
<div align="center">
|
| 153 |
-
<img src="
|
| 154 |
</div>
|
| 155 |
|
| 156 |
|
|
@@ -161,7 +161,7 @@ for res in output:
|
|
| 161 |
|
| 162 |
|
| 163 |
<div align="center">
|
| 164 |
-
<img src="
|
| 165 |
</div>
|
| 166 |
|
| 167 |
|
|
@@ -178,7 +178,7 @@ for res in output:
|
|
| 178 |
PaddleOCR-VL’s robust and versatile capability in handling diverse document types, establishing it as the leading method in the OmniDocBench-OCR-block performance evaluation.
|
| 179 |
|
| 180 |
<div align="center">
|
| 181 |
-
<img src="
|
| 182 |
</div>
|
| 183 |
|
| 184 |
|
|
@@ -187,7 +187,7 @@ PaddleOCR-VL’s robust and versatile capability in handling diverse document ty
|
|
| 187 |
In-house-OCR provides a evaluation of performance across multiple languages and text types. Our model demonstrates outstanding accuracy with the lowest edit distances in all evaluated scripts.
|
| 188 |
|
| 189 |
<div align="center">
|
| 190 |
-
<img src="
|
| 191 |
</div>
|
| 192 |
|
| 193 |
|
|
@@ -199,7 +199,7 @@ In-house-OCR provides a evaluation of performance across multiple languages and
|
|
| 199 |
Our self-built evaluation set contains diverse types of table images, such as Chinese, English, mixed Chinese-English, and tables with various characteristics like full, partial, or no borders, book/manual formats, lists, academic papers, merged cells, as well as low-quality, watermarked, etc. PaddleOCR-VL achieves remarkable performance across all categories.
|
| 200 |
|
| 201 |
<div align="center">
|
| 202 |
-
<img src="
|
| 203 |
</div>
|
| 204 |
|
| 205 |
#### 3. Formula
|
|
@@ -209,7 +209,7 @@ Our self-built evaluation set contains diverse types of table images, such as Ch
|
|
| 209 |
In-house-Formula evaluation set contains simple prints, complex prints, camera scans, and handwritten formulas. PaddleOCR-VL demonstrates the best performance in every category.
|
| 210 |
|
| 211 |
<div align="center">
|
| 212 |
-
<img src="
|
| 213 |
</div>
|
| 214 |
|
| 215 |
|
|
@@ -220,7 +220,7 @@ In-house-Formula evaluation set contains simple prints, complex prints, camera s
|
|
| 220 |
The evaluation set is broadly categorized into 11 chart categories, including bar-line hybrid, pie, 100% stacked bar, area, bar, bubble, histogram, line, scatterplot, stacked area, and stacked bar. PaddleOCR-VL not only outperforms expert OCR VLMs but also surpasses some 72B-level multimodal language models.
|
| 221 |
|
| 222 |
<div align="center">
|
| 223 |
-
<img src="
|
| 224 |
</div>
|
| 225 |
|
| 226 |
|
|
@@ -235,42 +235,42 @@ The evaluation set is broadly categorized into 11 chart categories, including ba
|
|
| 235 |
### Comprehensive Document Parsing
|
| 236 |
|
| 237 |
<div align="center">
|
| 238 |
-
<img src="
|
| 239 |
-
<img src="
|
| 240 |
-
<img src="
|
| 241 |
-
<img src="
|
| 242 |
</div>
|
| 243 |
|
| 244 |
|
| 245 |
### Text
|
| 246 |
|
| 247 |
<div align="center">
|
| 248 |
-
<img src="
|
| 249 |
-
<img src="
|
| 250 |
</div>
|
| 251 |
|
| 252 |
|
| 253 |
### Table
|
| 254 |
|
| 255 |
<div align="center">
|
| 256 |
-
<img src="
|
| 257 |
-
<img src="
|
| 258 |
</div>
|
| 259 |
|
| 260 |
|
| 261 |
### Formula
|
| 262 |
|
| 263 |
<div align="center">
|
| 264 |
-
<img src="
|
| 265 |
-
<img src="
|
| 266 |
</div>
|
| 267 |
|
| 268 |
|
| 269 |
### Chart
|
| 270 |
|
| 271 |
<div align="center">
|
| 272 |
-
<img src="
|
| 273 |
-
<img src="
|
| 274 |
</div>
|
| 275 |
|
| 276 |
|
|
|
|
| 44 |
</div>
|
| 45 |
|
| 46 |
<div align="center">
|
| 47 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/allmetric.png" width="800"/>
|
| 48 |
</div>
|
| 49 |
|
| 50 |
## Introduction
|
|
|
|
| 67 |
<!-- PaddleOCR-VL decomposes the complex task of document parsing into a two stages. The first stage, PP-DocLayoutV2, is responsible for layout analysis, where it localizes semantic regions and predicts their reading order. Subsequently, the second stage, PaddleOCR-VL-0.9B, leverages these layout predictions to perform fine-grained recognition of diverse content, including text, tables, formulas, and charts. Finally, a lightweight post-processing module aggregates the outputs from both stages and formats the final document into structured Markdown and JSON. -->
|
| 68 |
|
| 69 |
<div align="center">
|
| 70 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/paddleocrvl.png" width="800"/>
|
| 71 |
</div>
|
| 72 |
|
| 73 |
|
|
|
|
| 150 |
##### PaddleOCR-VL achieves SOTA performance for overall, text, formula, tables and reading order on OmniDocBench v1.5
|
| 151 |
|
| 152 |
<div align="center">
|
| 153 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/omni15.png" width="800"/>
|
| 154 |
</div>
|
| 155 |
|
| 156 |
|
|
|
|
| 161 |
|
| 162 |
|
| 163 |
<div align="center">
|
| 164 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/omni10.png" width="800"/>
|
| 165 |
</div>
|
| 166 |
|
| 167 |
|
|
|
|
| 178 |
PaddleOCR-VL’s robust and versatile capability in handling diverse document types, establishing it as the leading method in the OmniDocBench-OCR-block performance evaluation.
|
| 179 |
|
| 180 |
<div align="center">
|
| 181 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/omnibenchocr.png" width="800"/>
|
| 182 |
</div>
|
| 183 |
|
| 184 |
|
|
|
|
| 187 |
In-house-OCR provides a evaluation of performance across multiple languages and text types. Our model demonstrates outstanding accuracy with the lowest edit distances in all evaluated scripts.
|
| 188 |
|
| 189 |
<div align="center">
|
| 190 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhouseocr.png" width="800"/>
|
| 191 |
</div>
|
| 192 |
|
| 193 |
|
|
|
|
| 199 |
Our self-built evaluation set contains diverse types of table images, such as Chinese, English, mixed Chinese-English, and tables with various characteristics like full, partial, or no borders, book/manual formats, lists, academic papers, merged cells, as well as low-quality, watermarked, etc. PaddleOCR-VL achieves remarkable performance across all categories.
|
| 200 |
|
| 201 |
<div align="center">
|
| 202 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhousetable.png" width="600"/>
|
| 203 |
</div>
|
| 204 |
|
| 205 |
#### 3. Formula
|
|
|
|
| 209 |
In-house-Formula evaluation set contains simple prints, complex prints, camera scans, and handwritten formulas. PaddleOCR-VL demonstrates the best performance in every category.
|
| 210 |
|
| 211 |
<div align="center">
|
| 212 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhouse-formula.png" width="500"/>
|
| 213 |
</div>
|
| 214 |
|
| 215 |
|
|
|
|
| 220 |
The evaluation set is broadly categorized into 11 chart categories, including bar-line hybrid, pie, 100% stacked bar, area, bar, bubble, histogram, line, scatterplot, stacked area, and stacked bar. PaddleOCR-VL not only outperforms expert OCR VLMs but also surpasses some 72B-level multimodal language models.
|
| 221 |
|
| 222 |
<div align="center">
|
| 223 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhousechart.png" width="400"/>
|
| 224 |
</div>
|
| 225 |
|
| 226 |
|
|
|
|
| 235 |
### Comprehensive Document Parsing
|
| 236 |
|
| 237 |
<div align="center">
|
| 238 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview1.jpg" width="600"/>
|
| 239 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview2.jpg" width="600"/>
|
| 240 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview3.jpg" width="600"/>
|
| 241 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview4.jpg" width="600"/>
|
| 242 |
</div>
|
| 243 |
|
| 244 |
|
| 245 |
### Text
|
| 246 |
|
| 247 |
<div align="center">
|
| 248 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/text_english_arabic.jpg" width="300" style="display: inline-block;"/>
|
| 249 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/text_handwriting_02.jpg" width="300" style="display: inline-block;"/>
|
| 250 |
</div>
|
| 251 |
|
| 252 |
|
| 253 |
### Table
|
| 254 |
|
| 255 |
<div align="center">
|
| 256 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/table_01.jpg" width="300" style="display: inline-block;"/>
|
| 257 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/table_02.jpg" width="300" style="display: inline-block;"/>
|
| 258 |
</div>
|
| 259 |
|
| 260 |
|
| 261 |
### Formula
|
| 262 |
|
| 263 |
<div align="center">
|
| 264 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/formula_EN.jpg" width="300" style="display: inline-block;"/>
|
| 265 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/formula_ZH.jpg" width="300" style="display: inline-block;"/>
|
| 266 |
</div>
|
| 267 |
|
| 268 |
|
| 269 |
### Chart
|
| 270 |
|
| 271 |
<div align="center">
|
| 272 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/chart_01.jpg" width="300" style="display: inline-block;"/>
|
| 273 |
+
<img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/chart_02.jpg" width="300" style="display: inline-block;"/>
|
| 274 |
</div>
|
| 275 |
|
| 276 |
|