Qwen2.5 VL 32B Instruct
Multimodal
Zero-eval
#1ScreenSpot
#1InfoVQA
#1Android Control High_EM
+15 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2.5-VL 32B was developed as a mid-sized vision-language model, designed to balance multimodal capability with practical deployment considerations. Built with 32 billion parameters for vision and language integration, it serves applications requiring strong visual understanding without flagship-scale resources.
+
+
+
+
Timeline
AnnouncedFeb 28, 2025
ReleasedFeb 28, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown
Overall Performance
28 benchmarks
Average Score
63.6%
Best Score
94.8%
High Performers (80%+)
8+
+
+
+
All Benchmark Results for Qwen2.5 VL 32B Instruct
Complete list of benchmark scores with detailed information
| DocVQA | multimodal | 0.95 | 94.8% | Self-reported | |
| Android Control Low_EM | multimodal | 0.93 | 93.3% | Self-reported | |
| HumanEval | text | 0.92 | 91.5% | Self-reported | |
| ScreenSpot | multimodal | 0.89 | 88.5% | Self-reported | |
| MBPP | text | 0.84 | 84.0% | Self-reported | |
| InfoVQA | multimodal | 0.83 | 83.4% | Self-reported | |
| AITZ_EM | multimodal | 0.83 | 83.1% | Self-reported | |
| MATH | text | 0.82 | 82.2% | Self-reported | |
| MMLU | text | 0.78 | 78.4% | Self-reported | |
| VideoMME w sub. | multimodal | 0.78 | 77.9% | Self-reported |
Showing 1 to 10 of 28 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+