Alibaba Cloud / Qwen Team

Qwen2.5 VL 7B Instruct

Multimodal
Zero-eval
#1MobileMiniWob++_SR
#1MLVU
#1LongVideoBench
+21 more

by Alibaba Cloud / Qwen Team

+
+
+
+
About

Qwen2.5-VL 7B was developed as an efficient vision-language model, designed to provide multimodal understanding with minimal computational requirements. Built with 7 billion parameters for integrated visual and textual processing, it serves applications requiring practical vision-language capabilities with constrained resources.

+
+
+
+
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown

Overall Performance

32 benchmarks
Average Score
64.5%
Best Score
95.7%
High Performers (80%+)
10
+
+
+
+
All Benchmark Results for Qwen2.5 VL 7B Instruct
Complete list of benchmark scores with detailed information
DocVQA
multimodal
0.96
95.7%
Self-reported
MobileMiniWob++_SR
multimodal
0.91
91.4%
Self-reported
Android Control Low_EM
multimodal
0.91
91.4%
Self-reported
ChartQA
multimodal
0.87
87.3%
Self-reported
OCRBench
multimodal
0.86
86.4%
Self-reported
TextVQA
multimodal
0.85
84.9%
Self-reported
ScreenSpot
multimodal
0.85
84.7%
Self-reported
MMBench
multimodal
0.84
84.3%
Self-reported
InfoVQA
multimodal
0.83
82.6%
Self-reported
AITZ_EM
multimodal
0.82
81.9%
Self-reported
Showing 1 to 10 of 32 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+