VQAv2
multimodal
+
+
+
+
About
VQAv2 is a large-scale visual question answering benchmark featuring over 250,000 images and 1.1 million questions designed to reduce language bias and improve visual reasoning requirements. This comprehensive evaluation tests AI models' ability to understand images and answer natural language questions, requiring genuine visual comprehension rather than relying on linguistic patterns or dataset biases.
+
+
+
+
Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
80.9%
Average Score
79.2%
High Performers (80%+)
1Top Organizations
#1Mistral AI
2 models
79.8%
#2Meta
1 model
78.1%
+
+
+
+
Leaderboard
3 models ranked by performance on VQAv2
License | Links | ||||
---|---|---|---|---|---|
Nov 18, 2024 | Mistral Research License (MRL) for research; Mistral Commercial License for commercial use | 80.9% | |||
Sep 17, 2024 | Apache 2.0 | 78.6% | |||
Sep 25, 2024 | Llama 3.2 | 78.1% |