
Qwen2.5 VL 7B Instruct
Multimodal
Zero-eval
#1MobileMiniWob++_SR
#1MLVU
#1LongVideoBench
+21 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2.5 VL 7B Instruct is a multimodal language model developed by Alibaba Cloud / Qwen Team. It achieves strong performance with an average score of 64.5% across 32 benchmarks. It excels particularly in DocVQA (95.7%), MobileMiniWob++_SR (91.4%), Android Control Low_EM (91.4%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba Cloud / Qwen Team's latest advancement in AI technology.
+
+
+
+
Timeline
AnnouncedJan 26, 2025
ReleasedJan 26, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown
Overall Performance
32 benchmarks
Average Score
64.5%
Best Score
95.7%
High Performers (80%+)
10+
+
+
+
All Benchmark Results for Qwen2.5 VL 7B Instruct
Complete list of benchmark scores with detailed information
DocVQA | multimodal | 0.96 | 95.7% | Self-reported | |
MobileMiniWob++_SR | multimodal | 0.91 | 91.4% | Self-reported | |
Android Control Low_EM | multimodal | 0.91 | 91.4% | Self-reported | |
ChartQA | multimodal | 0.87 | 87.3% | Self-reported | |
OCRBench | multimodal | 0.86 | 86.4% | Self-reported | |
TextVQA | multimodal | 0.85 | 84.9% | Self-reported | |
ScreenSpot | multimodal | 0.85 | 84.7% | Self-reported | |
MMBench | multimodal | 0.84 | 84.3% | Self-reported | |
InfoVQA | multimodal | 0.83 | 82.6% | Self-reported | |
AITZ_EM | multimodal | 0.82 | 81.9% | Self-reported |
Showing 1 to 10 of 32 benchmarks