Update README.md
Browse files
README.md
CHANGED
|
@@ -44,35 +44,35 @@ Complete end-to-end training framework designed for maximum efficiency:
|
|
| 44 |
## Evaluation Results
|
| 45 |
All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
|
| 46 |
|
| 47 |
-
| | **LLaVA-OV-1.5-8B** | **Qwen2.5 VL 7B** |
|
| 48 |
-
|
| 49 |
-
| MMMU (Validation) | **55.44** | 51.33 |
|
| 50 |
-
| MMMU-Pro (Standard) | **37.40** | 36.30 |
|
| 51 |
-
| MMMU-Pro (Vision) | 25.15 | **32.83** |
|
| 52 |
-
| MMBench (English; Test) | **84.14** | 83.40 |
|
| 53 |
-
| MMBench (Chinese; Test) | 81.00 | **81.61** |
|
| 54 |
-
| MME-RealWorld (English) | **62.31** | 57.33 |
|
| 55 |
-
| MME-RealWorld (Chinese) | **56.11** | 51.50 |
|
| 56 |
-
| AI2D (With Mask) | **84.16** | 82.58 |
|
| 57 |
-
| AI2D (Without Mask) | **94.11** | 93.36 |
|
| 58 |
-
| CV-Bench | **80.82** | 79.95 |
|
| 59 |
-
| VL-RewardBench | 45.90 | **49.65** |
|
| 60 |
-
| V* | **78.01** | 76.96 |
|
| 61 |
-
| PixmoCount | 62.19 | **63.33** |
|
| 62 |
-
| CountBench | **88.19** | 86.35 |
|
| 63 |
-
| ChartQA | **86.48** | 84.08 |
|
| 64 |
-
| CharXiv (Direct Questions) | **74.10** | 69.80 |
|
| 65 |
-
| DocVQA (Test) | **95.00** | 94.93 |
|
| 66 |
-
| InfoVQA (Test) | 78.42 | **81.67** |
|
| 67 |
-
| WeMath | **33.62** | 33.33 |
|
| 68 |
-
| MathVista (Mini) | **69.57** | 68.60 |
|
| 69 |
-
| MathVision | **25.56** | 22.37 |
|
| 70 |
-
| MMStar | **67.72** | 62.54 |
|
| 71 |
-
| SEED-Bench (Image) | 77.32 | **77.53** |
|
| 72 |
-
| ScienceQA | **94.98** | 88.75 |
|
| 73 |
-
| SEED-Bench 2-Plus | 69.21 | **70.93** |
|
| 74 |
-
| OCRBench | 82.90 | **84.20** |
|
| 75 |
-
| RealWorldQA | 68.10 | **68.50** |
|
| 76 |
|
| 77 |
### Using 🤗 Transformers to Chat
|
| 78 |
Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
|
|
|
|
| 44 |
## Evaluation Results
|
| 45 |
All evaluations were conducted using [lmms_eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
|
| 46 |
|
| 47 |
+
| | **LLaVA-OV-1.5-8B** | **Qwen2.5 VL 7B** |
|
| 48 |
+
|:----------------------------------|:---------------:|:-------------:|
|
| 49 |
+
| MMMU (Validation) | **55.44** | 51.33 |
|
| 50 |
+
| MMMU-Pro (Standard) | **37.40** | 36.30 |
|
| 51 |
+
| MMMU-Pro (Vision) | 25.15 | **32.83** |
|
| 52 |
+
| MMBench (English; Test) | **84.14** | 83.40 |
|
| 53 |
+
| MMBench (Chinese; Test) | 81.00 | **81.61** |
|
| 54 |
+
| MME-RealWorld (English) | **62.31** | 57.33 |
|
| 55 |
+
| MME-RealWorld (Chinese) | **56.11** | 51.50 |
|
| 56 |
+
| AI2D (With Mask) | **84.16** | 82.58 |
|
| 57 |
+
| AI2D (Without Mask) | **94.11** | 93.36 |
|
| 58 |
+
| CV-Bench | **80.82** | 79.95 |
|
| 59 |
+
| VL-RewardBench | 45.90 | **49.65** |
|
| 60 |
+
| V* | **78.01** | 76.96 |
|
| 61 |
+
| PixmoCount | 62.19 | **63.33** |
|
| 62 |
+
| CountBench | **88.19** | 86.35 |
|
| 63 |
+
| ChartQA | **86.48** | 84.08 |
|
| 64 |
+
| CharXiv (Direct Questions) | **74.10** | 69.80 |
|
| 65 |
+
| DocVQA (Test) | **95.00** | 94.93 |
|
| 66 |
+
| InfoVQA (Test) | 78.42 | **81.67** |
|
| 67 |
+
| WeMath | **33.62** | 33.33 |
|
| 68 |
+
| MathVista (Mini) | **69.57** | 68.60 |
|
| 69 |
+
| MathVision | **25.56** | 22.37 |
|
| 70 |
+
| MMStar | **67.72** | 62.54 |
|
| 71 |
+
| SEED-Bench (Image) | 77.32 | **77.53** |
|
| 72 |
+
| ScienceQA | **94.98** | 88.75 |
|
| 73 |
+
| SEED-Bench 2-Plus | 69.21 | **70.93** |
|
| 74 |
+
| OCRBench | 82.90 | **84.20** |
|
| 75 |
+
| RealWorldQA | 68.10 | **68.50** |
|
| 76 |
|
| 77 |
### Using 🤗 Transformers to Chat
|
| 78 |
Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
|