DocVQA
multimodal
+
+
+
+
About
DocVQA (Document Visual Question Answering) is a benchmark that evaluates AI models' ability to understand document images and answer questions about their content. This benchmark tests multimodal reasoning capabilities, combining computer vision and natural language processing to extract information from documents. DocVQA measures how well AI systems can interpret visual document layouts and respond to high-level user queries about document content.
+
+
+
+
Evaluation Stats
Total Models26
Organizations10
Verified Results0
Self-Reported26
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
26 models
Top Score
96.4%
Average Score
91.4%
High Performers (80%+)
25Top Organizations
#1Alibaba Cloud / Qwen Team
4 models
95.5%
#2Anthropic
1 model
95.2%
#3Microsoft
1 model
93.2%
#4Mistral AI
3 models
93.0%
#5Amazon
2 models
93.0%
+
+
+
+
Leaderboard
26 models ranked by performance on DocVQA
License | Links | ||||
---|---|---|---|---|---|
Jan 26, 2025 | tongyi-qianwen | 96.4% | |||
Jan 26, 2025 | Apache 2.0 | 95.7% | |||
Oct 22, 2024 | Proprietary | 95.2% | |||
Mar 27, 2025 | Apache 2.0 | 95.2% | |||
Jun 20, 2025 | Apache 2.0 | 94.9% | |||
Feb 28, 2025 | Apache 2.0 | 94.8% | |||
Apr 5, 2025 | Llama 4 Community License Agreement | 94.4% | |||
Apr 5, 2025 | Llama 4 Community License Agreement | 94.4% | |||
Aug 13, 2024 | Proprietary | 93.6% | |||
Nov 20, 2024 | Proprietary | 93.5% |
Showing 1 to 10 of 26 models