Tingquan commited on
Commit
7a81160
·
verified ·
1 Parent(s): 3206d80

Use external img url to replace local img (#25)

Browse files

- Use external img url to replace local img (c380804c76a3117855b431bd4376b0cae2176810)

Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -44,7 +44,7 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
44
  </div>
45
 
46
  <div align="center">
47
- <img src="./imgs/allmetric.png" width="800"/>
48
  </div>
49
 
50
  ## Introduction
@@ -67,7 +67,7 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
67
  <!-- PaddleOCR-VL decomposes the complex task of document parsing into a two stages. The first stage, PP-DocLayoutV2, is responsible for layout analysis, where it localizes semantic regions and predicts their reading order. Subsequently, the second stage, PaddleOCR-VL-0.9B, leverages these layout predictions to perform fine-grained recognition of diverse content, including text, tables, formulas, and charts. Finally, a lightweight post-processing module aggregates the outputs from both stages and formats the final document into structured Markdown and JSON. -->
68
 
69
  <div align="center">
70
- <img src="./imgs/paddleocrvl.png" width="800"/>
71
  </div>
72
 
73
 
@@ -150,7 +150,7 @@ for res in output:
150
  ##### PaddleOCR-VL achieves SOTA performance for overall, text, formula, tables and reading order on OmniDocBench v1.5
151
 
152
  <div align="center">
153
- <img src="./imgs/omni15.png" width="800"/>
154
  </div>
155
 
156
 
@@ -161,7 +161,7 @@ for res in output:
161
 
162
 
163
  <div align="center">
164
- <img src="./imgs/omni10.png" width="800"/>
165
  </div>
166
 
167
 
@@ -178,7 +178,7 @@ for res in output:
178
  PaddleOCR-VL’s robust and versatile capability in handling diverse document types, establishing it as the leading method in the OmniDocBench-OCR-block performance evaluation.
179
 
180
  <div align="center">
181
- <img src="./imgs/omnibenchocr.png" width="800"/>
182
  </div>
183
 
184
 
@@ -187,7 +187,7 @@ PaddleOCR-VL’s robust and versatile capability in handling diverse document ty
187
  In-house-OCR provides a evaluation of performance across multiple languages and text types. Our model demonstrates outstanding accuracy with the lowest edit distances in all evaluated scripts.
188
 
189
  <div align="center">
190
- <img src="./imgs/inhouseocr.png" width="800"/>
191
  </div>
192
 
193
 
@@ -199,7 +199,7 @@ In-house-OCR provides a evaluation of performance across multiple languages and
199
  Our self-built evaluation set contains diverse types of table images, such as Chinese, English, mixed Chinese-English, and tables with various characteristics like full, partial, or no borders, book/manual formats, lists, academic papers, merged cells, as well as low-quality, watermarked, etc. PaddleOCR-VL achieves remarkable performance across all categories.
200
 
201
  <div align="center">
202
- <img src="./imgs/inhousetable.png" width="600"/>
203
  </div>
204
 
205
  #### 3. Formula
@@ -209,7 +209,7 @@ Our self-built evaluation set contains diverse types of table images, such as Ch
209
  In-house-Formula evaluation set contains simple prints, complex prints, camera scans, and handwritten formulas. PaddleOCR-VL demonstrates the best performance in every category.
210
 
211
  <div align="center">
212
- <img src="./imgs/inhouse-formula.png" width="500"/>
213
  </div>
214
 
215
 
@@ -220,7 +220,7 @@ In-house-Formula evaluation set contains simple prints, complex prints, camera s
220
  The evaluation set is broadly categorized into 11 chart categories, including bar-line hybrid, pie, 100% stacked bar, area, bar, bubble, histogram, line, scatterplot, stacked area, and stacked bar. PaddleOCR-VL not only outperforms expert OCR VLMs but also surpasses some 72B-level multimodal language models.
221
 
222
  <div align="center">
223
- <img src="./imgs/inhousechart.png" width="400"/>
224
  </div>
225
 
226
 
@@ -235,42 +235,42 @@ The evaluation set is broadly categorized into 11 chart categories, including ba
235
  ### Comprehensive Document Parsing
236
 
237
  <div align="center">
238
- <img src="./imgs/overview1.jpg" width="600"/>
239
- <img src="./imgs/overview2.jpg" width="600"/>
240
- <img src="./imgs/overview3.jpg" width="600"/>
241
- <img src="./imgs/overview4.jpg" width="600"/>
242
  </div>
243
 
244
 
245
  ### Text
246
 
247
  <div align="center">
248
- <img src="./imgs/text_english_arabic.jpg" width="300" style="display: inline-block;"/>
249
- <img src="./imgs/text_handwriting_02.jpg" width="300" style="display: inline-block;"/>
250
  </div>
251
 
252
 
253
  ### Table
254
 
255
  <div align="center">
256
- <img src="./imgs/table_01.jpg" width="300" style="display: inline-block;"/>
257
- <img src="./imgs/table_02.jpg" width="300" style="display: inline-block;"/>
258
  </div>
259
 
260
 
261
  ### Formula
262
 
263
  <div align="center">
264
- <img src="./imgs/formula_EN.jpg" width="300" style="display: inline-block;"/>
265
- <img src="./imgs/formula_ZH.jpg" width="300" style="display: inline-block;"/>
266
  </div>
267
 
268
 
269
  ### Chart
270
 
271
  <div align="center">
272
- <img src="./imgs/chart_01.jpg" width="300" style="display: inline-block;"/>
273
- <img src="./imgs/chart_02.jpg" width="300" style="display: inline-block;"/>
274
  </div>
275
 
276
 
 
44
  </div>
45
 
46
  <div align="center">
47
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/allmetric.png" width="800"/>
48
  </div>
49
 
50
  ## Introduction
 
67
  <!-- PaddleOCR-VL decomposes the complex task of document parsing into a two stages. The first stage, PP-DocLayoutV2, is responsible for layout analysis, where it localizes semantic regions and predicts their reading order. Subsequently, the second stage, PaddleOCR-VL-0.9B, leverages these layout predictions to perform fine-grained recognition of diverse content, including text, tables, formulas, and charts. Finally, a lightweight post-processing module aggregates the outputs from both stages and formats the final document into structured Markdown and JSON. -->
68
 
69
  <div align="center">
70
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/paddleocrvl.png" width="800"/>
71
  </div>
72
 
73
 
 
150
  ##### PaddleOCR-VL achieves SOTA performance for overall, text, formula, tables and reading order on OmniDocBench v1.5
151
 
152
  <div align="center">
153
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/omni15.png" width="800"/>
154
  </div>
155
 
156
 
 
161
 
162
 
163
  <div align="center">
164
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/omni10.png" width="800"/>
165
  </div>
166
 
167
 
 
178
  PaddleOCR-VL’s robust and versatile capability in handling diverse document types, establishing it as the leading method in the OmniDocBench-OCR-block performance evaluation.
179
 
180
  <div align="center">
181
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/omnibenchocr.png" width="800"/>
182
  </div>
183
 
184
 
 
187
  In-house-OCR provides a evaluation of performance across multiple languages and text types. Our model demonstrates outstanding accuracy with the lowest edit distances in all evaluated scripts.
188
 
189
  <div align="center">
190
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhouseocr.png" width="800"/>
191
  </div>
192
 
193
 
 
199
  Our self-built evaluation set contains diverse types of table images, such as Chinese, English, mixed Chinese-English, and tables with various characteristics like full, partial, or no borders, book/manual formats, lists, academic papers, merged cells, as well as low-quality, watermarked, etc. PaddleOCR-VL achieves remarkable performance across all categories.
200
 
201
  <div align="center">
202
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhousetable.png" width="600"/>
203
  </div>
204
 
205
  #### 3. Formula
 
209
  In-house-Formula evaluation set contains simple prints, complex prints, camera scans, and handwritten formulas. PaddleOCR-VL demonstrates the best performance in every category.
210
 
211
  <div align="center">
212
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhouse-formula.png" width="500"/>
213
  </div>
214
 
215
 
 
220
  The evaluation set is broadly categorized into 11 chart categories, including bar-line hybrid, pie, 100% stacked bar, area, bar, bubble, histogram, line, scatterplot, stacked area, and stacked bar. PaddleOCR-VL not only outperforms expert OCR VLMs but also surpasses some 72B-level multimodal language models.
221
 
222
  <div align="center">
223
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/inhousechart.png" width="400"/>
224
  </div>
225
 
226
 
 
235
  ### Comprehensive Document Parsing
236
 
237
  <div align="center">
238
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview1.jpg" width="600"/>
239
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview2.jpg" width="600"/>
240
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview3.jpg" width="600"/>
241
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/overview4.jpg" width="600"/>
242
  </div>
243
 
244
 
245
  ### Text
246
 
247
  <div align="center">
248
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/text_english_arabic.jpg" width="300" style="display: inline-block;"/>
249
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/text_handwriting_02.jpg" width="300" style="display: inline-block;"/>
250
  </div>
251
 
252
 
253
  ### Table
254
 
255
  <div align="center">
256
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/table_01.jpg" width="300" style="display: inline-block;"/>
257
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/table_02.jpg" width="300" style="display: inline-block;"/>
258
  </div>
259
 
260
 
261
  ### Formula
262
 
263
  <div align="center">
264
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/formula_EN.jpg" width="300" style="display: inline-block;"/>
265
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/formula_ZH.jpg" width="300" style="display: inline-block;"/>
266
  </div>
267
 
268
 
269
  ### Chart
270
 
271
  <div align="center">
272
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/chart_01.jpg" width="300" style="display: inline-block;"/>
273
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/chart_02.jpg" width="300" style="display: inline-block;"/>
274
  </div>
275
 
276