Alibaba Cloud / Qwen Team

Qwen2-VL-72B-Instruct

Multimodal
Zero-eval
#1DocVQAtest
#1VCR_en_easy
#1MMBench_test
+10 more

by Alibaba Cloud / Qwen Team

+
+
+
+
About

Qwen2-VL 72B was developed as a large vision-language model, designed to handle multimodal tasks combining visual and textual understanding. Built with 72 billion parameters for integrated vision and language processing, it enables applications requiring sophisticated analysis of images alongside text.

+
+
+
+
Timeline
AnnouncedAug 29, 2024
ReleasedAug 29, 2024
Knowledge CutoffJun 30, 2023
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
tongyi-qianwen
Performance Overview
Performance metrics and category breakdown

Overall Performance

15 benchmarks
Average Score
75.8%
Best Score
96.5%
High Performers (80%+)
7
+
+
+
+
All Benchmark Results for Qwen2-VL-72B-Instruct
Complete list of benchmark scores with detailed information
DocVQAtest
multimodal
0.96
96.5%
Self-reported
VCR_en_easy
multimodal
0.92
91.9%
Self-reported
ChartQA
multimodal
0.88
88.3%
Self-reported
OCRBench
multimodal
0.88
87.7%
Self-reported
MMBench_test
multimodal
0.86
86.5%
Self-reported
TextVQA
multimodal
0.85
85.5%
Self-reported
InfoVQAtest
multimodal
0.84
84.5%
Self-reported
EgoSchema
video
0.78
77.9%
Self-reported
RealWorldQA
multimodal
0.78
77.8%
Self-reported
MMVetGPT4Turbo
multimodal
0.74
74.0%
Self-reported
Showing 1 to 10 of 15 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+