Qwen
/

Qwen2.5-VL-3B-Instruct

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

shuai bai commited on Jan 27

Commit

9c1a731

·

verified ·

1 Parent(s): 37ce9f6

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -428,7 +428,7 @@ The model supports a wide range of resolution inputs. By default, it uses the na
 min_pixels = 256 * 28 * 28
 max_pixels = 1280 * 28 * 28
 processor = AutoProcessor.from_pretrained(
-    "Qwen/Qwen2.5-VL-7B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels
 )
 ```
@@ -478,6 +478,7 @@ To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://ar
 For supported frameworks, you could add the following to `config.json` to enable YaRN:
 {
 	...,
     "type": "yarn",
@@ -489,6 +490,7 @@ For supported frameworks, you could add the following to `config.json` to enable
     "factor": 4,
     "original_max_position_embeddings": 32768
 }
 However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use.

 min_pixels = 256 * 28 * 28
 max_pixels = 1280 * 28 * 28
 processor = AutoProcessor.from_pretrained(
+    "Qwen/Qwen2.5-VL-3B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels
 )
 ```
 For supported frameworks, you could add the following to `config.json` to enable YaRN:
+```
 {
 	...,
     "type": "yarn",
     "factor": 4,
     "original_max_position_embeddings": 32768
 }
+```
 However, it should be noted that this method has a significant impact on the performance of temporal and spatial localization tasks, and is therefore not recommended for use.