Alibaba Cloud / Qwen Team

Qwen2.5 VL 72B Instruct

Multimodal
Zero-eval
#1DocVQA
#1Android Control Low_EM
#1OCRBench
+24 more

by Alibaba Cloud / Qwen Team

+
+
+
+
About

Qwen2.5 VL 72B Instruct is a multimodal language model developed by Alibaba Cloud / Qwen Team. It achieves strong performance with an average score of 66.9% across 30 benchmarks. It excels particularly in DocVQA (96.4%), Android Control Low_EM (93.7%), ChartQA (89.5%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2025, it represents Alibaba Cloud / Qwen Team's latest advancement in AI technology.

+
+
+
+
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
tongyi-qianwen
Performance Overview
Performance metrics and category breakdown

Overall Performance

30 benchmarks
Average Score
66.9%
Best Score
96.4%
High Performers (80%+)
8
+
+
+
+
All Benchmark Results for Qwen2.5 VL 72B Instruct
Complete list of benchmark scores with detailed information
DocVQA
multimodal
0.96
96.4%
Self-reported
Android Control Low_EM
multimodal
0.94
93.7%
Self-reported
ChartQA
multimodal
0.90
89.5%
Self-reported
OCRBench
multimodal
0.89
88.5%
Self-reported
AI2D
multimodal
0.88
88.4%
Self-reported
MMBench
multimodal
0.88
88.0%
Self-reported
ScreenSpot
multimodal
0.87
87.1%
Self-reported
AITZ_EM
multimodal
0.83
83.2%
Self-reported
CC-OCR
multimodal
0.80
79.8%
Self-reported
EgoSchema
video
0.76
76.2%
Self-reported
Showing 1 to 10 of 30 benchmarks