
Qwen2.5 VL 72B Instruct
Multimodal
Zero-eval
#1DocVQA
#1Android Control Low_EM
#1OCRBench
+24 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2.5 VL 72B Instruct is a multimodal language model developed by Alibaba Cloud / Qwen Team. It achieves strong performance with an average score of 66.9% across 30 benchmarks. It excels particularly in DocVQA (96.4%), Android Control Low_EM (93.7%), ChartQA (89.5%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Alibaba Cloud / Qwen Team's latest advancement in AI technology.
+
+
+
+
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
tongyi-qianwen
Performance Overview
Performance metrics and category breakdown
Overall Performance
30 benchmarks
Average Score
66.9%
Best Score
96.4%
High Performers (80%+)
8+
+
+
+
All Benchmark Results for Qwen2.5 VL 72B Instruct
Complete list of benchmark scores with detailed information
DocVQA | multimodal | 0.96 | 96.4% | Self-reported | |
Android Control Low_EM | multimodal | 0.94 | 93.7% | Self-reported | |
ChartQA | multimodal | 0.90 | 89.5% | Self-reported | |
OCRBench | multimodal | 0.89 | 88.5% | Self-reported | |
AI2D | multimodal | 0.88 | 88.4% | Self-reported | |
MMBench | multimodal | 0.88 | 88.0% | Self-reported | |
ScreenSpot | multimodal | 0.87 | 87.1% | Self-reported | |
AITZ_EM | multimodal | 0.83 | 83.2% | Self-reported | |
CC-OCR | multimodal | 0.80 | 79.8% | Self-reported | |
EgoSchema | video | 0.76 | 76.2% | Self-reported |
Showing 1 to 10 of 30 benchmarks