DocVQA

multimodal
+
+
+
+
About

DocVQA (Document Visual Question Answering) is a benchmark that evaluates AI models' ability to understand document images and answer questions about their content. This benchmark tests multimodal reasoning capabilities, combining computer vision and natural language processing to extract information from documents. DocVQA measures how well AI systems can interpret visual document layouts and respond to high-level user queries about document content.

+
+
+
+
Evaluation Stats
Total Models26
Organizations10
Verified Results0
Self-Reported26
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

26 models
Top Score
96.4%
Average Score
91.4%
High Performers (80%+)
25

Top Organizations

#1Alibaba Cloud / Qwen Team
4 models
95.5%
#2Anthropic
1 model
95.2%
#3Microsoft
1 model
93.2%
#4Mistral AI
3 models
93.0%
#5Amazon
2 models
93.0%
+
+
+
+
Leaderboard
26 models ranked by performance on DocVQA
LicenseLinks
Jan 26, 2025
tongyi-qianwen
96.4%
Jan 26, 2025
Apache 2.0
95.7%
Oct 22, 2024
Proprietary
95.2%
Mar 27, 2025
Apache 2.0
95.2%
Jun 20, 2025
Apache 2.0
94.9%
Feb 28, 2025
Apache 2.0
94.8%
Apr 5, 2025
Llama 4 Community License Agreement
94.4%
Apr 5, 2025
Llama 4 Community License Agreement
94.4%
Aug 13, 2024
Proprietary
93.6%
Nov 20, 2024
Proprietary
93.5%
Showing 1 to 10 of 26 models
+
+
+
+
Resources