InfoVQA

multimodal

About

InfoVQA (Infographic Visual Question Answering) is a comprehensive benchmark featuring diverse infographics with natural language questions and answers. This dataset tests AI models' ability to understand complex visual information, extract text from images, and perform reasoning over infographic content. InfoVQA evaluates multimodal understanding capabilities for real-world information graphics and data visualizations.

Evaluation Stats

Total Models9

Organizations4

Verified Results0

Self-Reported9

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

9 models

Top Score

83.4%

Average Score

71.6%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

2 models

83.0%

#2DeepSeek

3 models

73.3%

#3Microsoft

1 model

72.7%

#4Google

3 models

61.8%

Leaderboard

9 models ranked by performance on InfoVQA

			License
#01Qwen2.5 VL 32B Instruct	Alibaba Cloud / Qwen Team	Feb 28, 2025	Apache 2.0	83.4%
#02Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	82.6%
#03DeepSeek VL2	DeepSeek	Dec 13, 2024	deepseek	78.1%
#04DeepSeek VL2 Small	DeepSeek	Dec 13, 2024	deepseek	75.8%
#05Phi-4-multimodal-instruct	Microsoft	Feb 1, 2025	MIT	72.7%
#06Gemma 3 27B	Google	Mar 12, 2025	Gemma	70.6%
#07DeepSeek VL2 Tiny	DeepSeek	Dec 13, 2024	deepseek	66.1%
#08Gemma 3 12B	Google	Mar 12, 2025	Gemma	64.9%
#09Gemma 3 4B	Google	Mar 12, 2025	Gemma	50.0%

Resources

Research Paper