Alibaba Cloud / Qwen Team

Qwen2.5 VL 32B Instruct

Multimodal
Zero-eval
#1ScreenSpot
#1InfoVQA
#1Android Control High_EM
+16 more

by Alibaba Cloud / Qwen Team

+
+
+
+
About

Qwen2.5 VL 32B Instruct is a multimodal language model developed by Alibaba Cloud / Qwen Team. It achieves strong performance with an average score of 63.6% across 28 benchmarks. It excels particularly in DocVQA (94.8%), Android Control Low_EM (93.3%), HumanEval (91.5%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba Cloud / Qwen Team's latest advancement in AI technology.

+
+
+
+
Timeline
AnnouncedFeb 28, 2025
ReleasedFeb 28, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown

Overall Performance

28 benchmarks
Average Score
63.6%
Best Score
94.8%
High Performers (80%+)
8
+
+
+
+
All Benchmark Results for Qwen2.5 VL 32B Instruct
Complete list of benchmark scores with detailed information
DocVQA
multimodal
0.95
94.8%
Self-reported
Android Control Low_EM
multimodal
0.93
93.3%
Self-reported
HumanEval
text
0.92
91.5%
Self-reported
ScreenSpot
multimodal
0.89
88.5%
Self-reported
MBPP
text
0.84
84.0%
Self-reported
InfoVQA
multimodal
0.83
83.4%
Self-reported
AITZ_EM
multimodal
0.83
83.1%
Self-reported
MATH
text
0.82
82.2%
Self-reported
MMLU
text
0.78
78.4%
Self-reported
VideoMME w sub.
multimodal
0.78
77.9%
Self-reported
Showing 1 to 10 of 28 benchmarks