Alibaba Cloud / Qwen Team

Qwen2.5 VL 72B Instruct

Multimodal
Zero-eval
#1DocVQA
#1Android Control Low_EM
#1OCRBench
+24 more

by Alibaba Cloud / Qwen Team

+
+
+
+
About

Qwen2.5-VL 72B was created as the flagship vision-language model in the Qwen 2.5 series, designed to provide advanced multimodal understanding. Built with 72 billion parameters optimized for visual and textual reasoning, it represents Qwen's most capable offering for tasks requiring integrated image and language processing.

+
+
+
+
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
tongyi-qianwen
Performance Overview
Performance metrics and category breakdown

Overall Performance

30 benchmarks
Average Score
66.9%
Best Score
96.4%
High Performers (80%+)
8
+
+
+
+
All Benchmark Results for Qwen2.5 VL 72B Instruct
Complete list of benchmark scores with detailed information
DocVQA
multimodal
0.96
96.4%
Self-reported
Android Control Low_EM
multimodal
0.94
93.7%
Self-reported
ChartQA
multimodal
0.90
89.5%
Self-reported
OCRBench
multimodal
0.89
88.5%
Self-reported
AI2D
multimodal
0.88
88.4%
Self-reported
MMBench
multimodal
0.88
88.0%
Self-reported
ScreenSpot
multimodal
0.87
87.1%
Self-reported
AITZ_EM
multimodal
0.83
83.2%
Self-reported
CC-OCR
multimodal
0.80
79.8%
Self-reported
EgoSchema
video
0.76
76.2%
Self-reported
Showing 1 to 10 of 30 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+