Alibaba Cloud / Qwen Team

Qwen2.5 VL 32B Instruct

Multimodal
Zero-eval
#1ScreenSpot
#1InfoVQA
#1Android Control High_EM
+15 more

by Alibaba Cloud / Qwen Team

+
+
+
+
About

Qwen2.5-VL 32B was developed as a mid-sized vision-language model, designed to balance multimodal capability with practical deployment considerations. Built with 32 billion parameters for vision and language integration, it serves applications requiring strong visual understanding without flagship-scale resources.

+
+
+
+
Timeline
AnnouncedFeb 28, 2025
ReleasedFeb 28, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown

Overall Performance

28 benchmarks
Average Score
63.6%
Best Score
94.8%
High Performers (80%+)
8
+
+
+
+
All Benchmark Results for Qwen2.5 VL 32B Instruct
Complete list of benchmark scores with detailed information
DocVQA
multimodal
0.95
94.8%
Self-reported
Android Control Low_EM
multimodal
0.93
93.3%
Self-reported
HumanEval
text
0.92
91.5%
Self-reported
ScreenSpot
multimodal
0.89
88.5%
Self-reported
MBPP
text
0.84
84.0%
Self-reported
InfoVQA
multimodal
0.83
83.4%
Self-reported
AITZ_EM
multimodal
0.83
83.1%
Self-reported
MATH
text
0.82
82.2%
Self-reported
MMLU
text
0.78
78.4%
Self-reported
VideoMME w sub.
multimodal
0.78
77.9%
Self-reported
Showing 1 to 10 of 28 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+