VQAv2

multimodal
+
+
+
+
About

VQAv2 is a large-scale visual question answering benchmark featuring over 250,000 images and 1.1 million questions designed to reduce language bias and improve visual reasoning requirements. This comprehensive evaluation tests AI models' ability to understand images and answer natural language questions, requiring genuine visual comprehension rather than relying on linguistic patterns or dataset biases.

+
+
+
+
Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
80.9%
Average Score
79.2%
High Performers (80%+)
1

Top Organizations

#1Mistral AI
2 models
79.8%
#2Meta
1 model
78.1%
+
+
+
+
Leaderboard
3 models ranked by performance on VQAv2
LicenseLinks
Nov 18, 2024
Mistral Research License (MRL) for research; Mistral Commercial License for commercial use
80.9%
Sep 17, 2024
Apache 2.0
78.6%
Sep 25, 2024
Llama 3.2
78.1%
+
+
+
+
Resources