tencent
/

POINTS-Reader

@@ -32,7 +32,7 @@ POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Docume
   </a>
 </p>
-We are delighted to announce that the WePOINTS family has welcomed a new member: [POINTS-Reader](https://github.com/Tencent/POINTS-Reader), a vision-language model for end-to-end document conversion.
 ## News
@@ -51,7 +51,7 @@ We are delighted to announce that the WePOINTS family has welcomed a new member:
 ## Results
-We take the following results from [OmniDocBench](https://github.com/opendatalab/OmniDocBench/tree/main) and POINTS-Reader for comparison:
 <table style="width: 92%; margin: auto; border-collapse: collapse;">
 <thead>
@@ -607,9 +607,9 @@ prompt = (
 image_path = '/path/to/your/local/image'
 model_path = 'tencent/POINTS-Reader'
 model = AutoModelForCausalLM.from_pretrained(model_path,
-                                                    trust_remote_code=True,
-                                                    torch_dtype=torch.float16,
-                                                    device_map='cuda')
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 image_processor = Qwen2ImageProcessorForPOINTSV15.from_pretrained(model_path)
 content = [
@@ -647,8 +647,8 @@ We will create a Pull Request to SGLang, please stay tuned.
 ## Known Issues
-- **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
-- **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
 - **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
 ## Citation

   </a>
 </p>
+We are delighted to announce that the WePOINTS family has welcomed a new member: POINTS-Reader, a vision-language model for end-to-end document conversion.
 ## News
 ## Results
+For comparison, we use the results reported by [OmniDocBench](https://github.com/opendatalab/OmniDocBench/tree/main) and POINTS-Reader. Compared with the version submitted to EMNLP 2025, the current release provides (1) improved performance and (2) support for Chinese documents. Both enhancements build upon the methods proposed in this paper.
 <table style="width: 92%; margin: auto; border-collapse: collapse;">
 <thead>
 image_path = '/path/to/your/local/image'
 model_path = 'tencent/POINTS-Reader'
 model = AutoModelForCausalLM.from_pretrained(model_path,
+                                             trust_remote_code=True,
+                                             torch_dtype=torch.float16,
+                                             device_map='cuda')
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 image_processor = Qwen2ImageProcessorForPOINTSV15.from_pretrained(model_path)
 content = [
 ## Known Issues
+- **Complex Document Parsing**: POINTS-Reader can struggle with complex layouts (e.g., newspapers), often producing repeated or missing content.
+- **Handwritten Document Parsing**: It also has difficulty handling handwritten inputs (e.g., receipts, notes), which can lead to recognition errors or omissions.
 - **Multi-language Document Parsing**: POINTS-Reader currently supports only English and Chinese, limiting its effectiveness on other languages.
 ## Citation