VQAv2

multimodal

About

VQAv2 is a large-scale visual question answering benchmark featuring over 250,000 images and 1.1 million questions designed to reduce language bias and improve visual reasoning requirements. This comprehensive evaluation tests AI models' ability to understand images and answer natural language questions, requiring genuine visual comprehension rather than relying on linguistic patterns or dataset biases.

Evaluation Stats

Total Models3

Organizations2

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

80.9%

Average Score

79.2%

High Performers (80%+)

Top Organizations

#1Mistral AI

2 models

79.8%

#2Meta

1 model

78.1%

Leaderboard

3 models ranked by performance on VQAv2

			License
#01Pixtral Large	Mistral AI	Nov 18, 2024	Mistral Research License (MRL) for research; Mistral Commercial License for commercial use	80.9%
#02Pixtral-12B	Mistral AI	Sep 17, 2024	Apache 2.0	78.6%
#03Llama 3.2 90B Instruct	Meta	Sep 25, 2024	Llama 3.2	78.1%

Resources

Research Paper