
Qwen2-VL-72B-Instruct
Multimodal
Zero-eval
#1DocVQAtest
#1VCR_en_easy
#1MMBench_test
+10 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2-VL-72B-Instruct is a multimodal language model developed by Alibaba Cloud / Qwen Team. It achieves strong performance with an average score of 75.8% across 15 benchmarks. It excels particularly in DocVQAtest (96.5%), VCR_en_easy (91.9%), ChartQA (88.3%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Alibaba Cloud / Qwen Team's latest advancement in AI technology.
+
+
+
+
Timeline
AnnouncedAug 29, 2024
ReleasedAug 29, 2024
Knowledge CutoffJun 30, 2023
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
tongyi-qianwen
Performance Overview
Performance metrics and category breakdown
Overall Performance
15 benchmarks
Average Score
75.8%
Best Score
96.5%
High Performers (80%+)
7+
+
+
+
All Benchmark Results for Qwen2-VL-72B-Instruct
Complete list of benchmark scores with detailed information
DocVQAtest | multimodal | 0.96 | 96.5% | Self-reported | |
VCR_en_easy | multimodal | 0.92 | 91.9% | Self-reported | |
ChartQA | multimodal | 0.88 | 88.3% | Self-reported | |
OCRBench | multimodal | 0.88 | 87.7% | Self-reported | |
MMBench_test | multimodal | 0.86 | 86.5% | Self-reported | |
TextVQA | multimodal | 0.85 | 85.5% | Self-reported | |
InfoVQAtest | multimodal | 0.84 | 84.5% | Self-reported | |
EgoSchema | video | 0.78 | 77.9% | Self-reported | |
RealWorldQA | multimodal | 0.78 | 77.8% | Self-reported | |
MMVetGPT4Turbo | multimodal | 0.74 | 74.0% | Self-reported |
Showing 1 to 10 of 15 benchmarks