Qwen2-VL-72B-Instruct
Multimodal
Zero-eval
#1DocVQAtest
#1VCR_en_easy
#1MMBench_test
+10 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2-VL 72B was developed as a large vision-language model, designed to handle multimodal tasks combining visual and textual understanding. Built with 72 billion parameters for integrated vision and language processing, it enables applications requiring sophisticated analysis of images alongside text.
+
+
+
+
Timeline
AnnouncedAug 29, 2024
ReleasedAug 29, 2024
Knowledge CutoffJun 30, 2023
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
tongyi-qianwen
Performance Overview
Performance metrics and category breakdown
Overall Performance
15 benchmarks
Average Score
75.8%
Best Score
96.5%
High Performers (80%+)
7+
+
+
+
All Benchmark Results for Qwen2-VL-72B-Instruct
Complete list of benchmark scores with detailed information
| DocVQAtest | multimodal | 0.96 | 96.5% | Self-reported | |
| VCR_en_easy | multimodal | 0.92 | 91.9% | Self-reported | |
| ChartQA | multimodal | 0.88 | 88.3% | Self-reported | |
| OCRBench | multimodal | 0.88 | 87.7% | Self-reported | |
| MMBench_test | multimodal | 0.86 | 86.5% | Self-reported | |
| TextVQA | multimodal | 0.85 | 85.5% | Self-reported | |
| InfoVQAtest | multimodal | 0.84 | 84.5% | Self-reported | |
| EgoSchema | video | 0.78 | 77.9% | Self-reported | |
| RealWorldQA | multimodal | 0.78 | 77.8% | Self-reported | |
| MMVetGPT4Turbo | multimodal | 0.74 | 74.0% | Self-reported |
Showing 1 to 10 of 15 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+