Update README.md
Browse files
README.md
CHANGED
|
@@ -32,7 +32,7 @@ POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Docume
|
|
| 32 |
</a>
|
| 33 |
</p>
|
| 34 |
|
| 35 |
-
We are delighted to announce that the WePOINTS family has welcomed a new member:
|
| 36 |
|
| 37 |
## News
|
| 38 |
|
|
@@ -51,7 +51,7 @@ We are delighted to announce that the WePOINTS family has welcomed a new member:
|
|
| 51 |
|
| 52 |
## Results
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
<table style="width: 92%; margin: auto; border-collapse: collapse;">
|
| 57 |
<thead>
|
|
@@ -607,9 +607,9 @@ prompt = (
|
|
| 607 |
image_path = '/path/to/your/local/image'
|
| 608 |
model_path = 'tencent/POINTS-Reader'
|
| 609 |
model = AutoModelForCausalLM.from_pretrained(model_path,
|
| 610 |
-
|
| 611 |
-
|
| 612 |
-
|
| 613 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 614 |
image_processor = Qwen2ImageProcessorForPOINTSV15.from_pretrained(model_path)
|
| 615 |
content = [
|
|
@@ -647,8 +647,8 @@ We will create a Pull Request to SGLang, please stay tuned.
|
|
| 647 |
|
| 648 |
## Known Issues
|
| 649 |
|
| 650 |
-
- **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
|
| 651 |
-
- **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
|
| 652 |
- **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
|
| 653 |
|
| 654 |
## Citation
|
|
|
|
| 32 |
</a>
|
| 33 |
</p>
|
| 34 |
|
| 35 |
+
We are delighted to announce that the WePOINTS family has welcomed a new member: POINTS-Reader, a vision-language model for end-to-end document conversion.
|
| 36 |
|
| 37 |
## News
|
| 38 |
|
|
|
|
| 51 |
|
| 52 |
## Results
|
| 53 |
|
| 54 |
+
For comparison, we use the results reported by [OmniDocBench](https://github.com/opendatalab/OmniDocBench/tree/main) and POINTS-Reader. Compared with the version submitted to EMNLP 2025, the current release provides (1) improved performance and (2) support for Chinese documents. Both enhancements build upon the methods proposed in this paper.
|
| 55 |
|
| 56 |
<table style="width: 92%; margin: auto; border-collapse: collapse;">
|
| 57 |
<thead>
|
|
|
|
| 607 |
image_path = '/path/to/your/local/image'
|
| 608 |
model_path = 'tencent/POINTS-Reader'
|
| 609 |
model = AutoModelForCausalLM.from_pretrained(model_path,
|
| 610 |
+
trust_remote_code=True,
|
| 611 |
+
torch_dtype=torch.float16,
|
| 612 |
+
device_map='cuda')
|
| 613 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 614 |
image_processor = Qwen2ImageProcessorForPOINTSV15.from_pretrained(model_path)
|
| 615 |
content = [
|
|
|
|
| 647 |
|
| 648 |
## Known Issues
|
| 649 |
|
| 650 |
+
- **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
|
| 651 |
+
- **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
|
| 652 |
- **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
|
| 653 |
|
| 654 |
## Citation
|