Qwen2.5 VL 72B Instruct
Multimodal
Zero-eval
#1DocVQA
#1Android Control Low_EM
#1OCRBench
+24 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2.5-VL 72B was created as the flagship vision-language model in the Qwen 2.5 series, designed to provide advanced multimodal understanding. Built with 72 billion parameters optimized for visual and textual reasoning, it represents Qwen's most capable offering for tasks requiring integrated image and language processing.
+
+
+
+
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
tongyi-qianwen
Performance Overview
Performance metrics and category breakdown
Overall Performance
30 benchmarks
Average Score
66.9%
Best Score
96.4%
High Performers (80%+)
8+
+
+
+
All Benchmark Results for Qwen2.5 VL 72B Instruct
Complete list of benchmark scores with detailed information
| DocVQA | multimodal | 0.96 | 96.4% | Self-reported | |
| Android Control Low_EM | multimodal | 0.94 | 93.7% | Self-reported | |
| ChartQA | multimodal | 0.90 | 89.5% | Self-reported | |
| OCRBench | multimodal | 0.89 | 88.5% | Self-reported | |
| AI2D | multimodal | 0.88 | 88.4% | Self-reported | |
| MMBench | multimodal | 0.88 | 88.0% | Self-reported | |
| ScreenSpot | multimodal | 0.87 | 87.1% | Self-reported | |
| AITZ_EM | multimodal | 0.83 | 83.2% | Self-reported | |
| CC-OCR | multimodal | 0.80 | 79.8% | Self-reported | |
| EgoSchema | video | 0.76 | 76.2% | Self-reported |
Showing 1 to 10 of 30 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+