Spaces:
Runtime error
Runtime error
Commit
·
e6f2eb8
1
Parent(s):
04ea7bb
update: docs with lib sources and notes
Browse files- docs/document_loader/image_loader/fitzpil_img_loader.md +19 -0
- docs/document_loader/image_loader/marker_img_loader.md +18 -0
- docs/document_loader/image_loader/pdf2image_img_loader.md +23 -0
- docs/document_loader/image_loader/pdfplumber_img_loader.md +19 -0
- docs/document_loader/image_loader/pymupdf_img_loader.md +20 -0
docs/document_loader/image_loader/fitzpil_img_loader.md
CHANGED
|
@@ -1,3 +1,22 @@
|
|
| 1 |
# Load images from PDF files (using Fitz & PIL)
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
::: medrag_multi_modal.document_loader.image_loader.fitzpil_img_loader
|
|
|
|
| 1 |
# Load images from PDF files (using Fitz & PIL)
|
| 2 |
|
| 3 |
+
??? note "Note"
|
| 4 |
+
**Underlying Library:** `fitz` & `pillow`
|
| 5 |
+
|
| 6 |
+
Extract images from PDF files using `fitz` and `pillow`.
|
| 7 |
+
|
| 8 |
+
Use it in our library with:
|
| 9 |
+
```python
|
| 10 |
+
from medrag_multi_modal.document_loader.image_loader import FitzPILImageLoader
|
| 11 |
+
```
|
| 12 |
+
|
| 13 |
+
For more details, please refer to the sources below.
|
| 14 |
+
|
| 15 |
+
**Sources:**
|
| 16 |
+
|
| 17 |
+
- [Docs](https://pymupdf.readthedocs.io/en/latest/intro.html)
|
| 18 |
+
- [GitHub](https://github.com/kastman/fitz)
|
| 19 |
+
- [PyPI](https://pypi.org/project/fitz/)
|
| 20 |
+
- [PyPI](https://pypi.org/project/pillow/)
|
| 21 |
+
|
| 22 |
::: medrag_multi_modal.document_loader.image_loader.fitzpil_img_loader
|
docs/document_loader/image_loader/marker_img_loader.md
CHANGED
|
@@ -1,3 +1,21 @@
|
|
| 1 |
# Load images from PDF files (using Marker)
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
::: medrag_multi_modal.document_loader.image_loader.marker_img_loader
|
|
|
|
| 1 |
# Load images from PDF files (using Marker)
|
| 2 |
|
| 3 |
+
??? note "Note"
|
| 4 |
+
**Underlying Library:** `marker-pdf`
|
| 5 |
+
|
| 6 |
+
Extract images from PDF files using `marker-pdf`.
|
| 7 |
+
|
| 8 |
+
Use it in our library with:
|
| 9 |
+
```python
|
| 10 |
+
from medrag_multi_modal.document_loader.image_loader import MarkerImageLoader
|
| 11 |
+
```
|
| 12 |
+
|
| 13 |
+
For details, please refer to the sources below.
|
| 14 |
+
|
| 15 |
+
**Sources:**
|
| 16 |
+
|
| 17 |
+
- [DataLab](https://www.datalab.to)
|
| 18 |
+
- [GitHub](https://github.com/VikParuchuri/marker)
|
| 19 |
+
- [PyPI](https://pypi.org/project/marker-pdf/)
|
| 20 |
+
|
| 21 |
::: medrag_multi_modal.document_loader.image_loader.marker_img_loader
|
docs/document_loader/image_loader/pdf2image_img_loader.md
CHANGED
|
@@ -1,3 +1,26 @@
|
|
| 1 |
# Load images from PDF files (using PDF2Image)
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
::: medrag_multi_modal.document_loader.image_loader.pdf2image_img_loader
|
|
|
|
| 1 |
# Load images from PDF files (using PDF2Image)
|
| 2 |
|
| 3 |
+
!!! danger "Warning"
|
| 4 |
+
Unlike other image extraction methods in `document_loader.image_loader`, this loader does not extract embedded images from the PDF.
|
| 5 |
+
Instead, it creates a snapshot image version of each selected page from the PDF.
|
| 6 |
+
|
| 7 |
+
??? note "Note"
|
| 8 |
+
**Underlying Library:** `pdf2image`
|
| 9 |
+
|
| 10 |
+
Extract images from PDF files using `pdf2image`.
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
Use it in our library with:
|
| 14 |
+
```python
|
| 15 |
+
from medrag_multi_modal.document_loader.image_loader import PDF2ImageLoader
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
For details and available `**kwargs`, please refer to the sources below.
|
| 19 |
+
|
| 20 |
+
**Sources:**
|
| 21 |
+
|
| 22 |
+
- [DataLab](https://www.datalab.to)
|
| 23 |
+
- [GitHub](https://github.com/VikParuchuri/marker)
|
| 24 |
+
- [PyPI](https://pypi.org/project/marker-pdf/)
|
| 25 |
+
|
| 26 |
::: medrag_multi_modal.document_loader.image_loader.pdf2image_img_loader
|
docs/document_loader/image_loader/pdfplumber_img_loader.md
CHANGED
|
@@ -1,3 +1,22 @@
|
|
| 1 |
# Load images from PDF files (using PDFPlumber)
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
::: medrag_multi_modal.document_loader.image_loader.pdfplumber_img_loader
|
|
|
|
| 1 |
# Load images from PDF files (using PDFPlumber)
|
| 2 |
|
| 3 |
+
??? note "Note"
|
| 4 |
+
**Underlying Library:** `pdfplumber`
|
| 5 |
+
|
| 6 |
+
Extract images from PDF files using `pdfplumber`.
|
| 7 |
+
|
| 8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
| 9 |
+
|
| 10 |
+
Use it in our library with:
|
| 11 |
+
```python
|
| 12 |
+
from medrag_multi_modal.document_loader.image_loader import PDFPlumberImageLoader
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
For details, please refer to the sources below.
|
| 16 |
+
|
| 17 |
+
**Sources:**
|
| 18 |
+
|
| 19 |
+
- [GitHub](https://github.com/jsvine/pdfplumber)
|
| 20 |
+
- [PyPI](https://pypi.org/project/pdfplumber/)
|
| 21 |
+
|
| 22 |
::: medrag_multi_modal.document_loader.image_loader.pdfplumber_img_loader
|
docs/document_loader/image_loader/pymupdf_img_loader.md
CHANGED
|
@@ -1,3 +1,23 @@
|
|
| 1 |
# Load images from PDF files (using PyMuPDF)
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
::: medrag_multi_modal.document_loader.image_loader.pymupdf_img_loader
|
|
|
|
| 1 |
# Load images from PDF files (using PyMuPDF)
|
| 2 |
|
| 3 |
+
??? note "Note"
|
| 4 |
+
**Underlying Library:** `pymupdf`
|
| 5 |
+
|
| 6 |
+
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
|
| 7 |
+
|
| 8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
| 9 |
+
|
| 10 |
+
Use it in our library with:
|
| 11 |
+
```python
|
| 12 |
+
from medrag_multi_modal.document_loader.image_loader import PyMuPDFImageLoader
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
For details, please refer to the sources below.
|
| 16 |
+
|
| 17 |
+
**Sources:**
|
| 18 |
+
|
| 19 |
+
- [Docs](https://pymupdf.readthedocs.io/en/latest/)
|
| 20 |
+
- [GitHub](https://github.com/pymupdf/PyMuPDF)
|
| 21 |
+
- [PyPI](https://pypi.org/project/PyMuPDF/)
|
| 22 |
+
|
| 23 |
::: medrag_multi_modal.document_loader.image_loader.pymupdf_img_loader
|