RealWorldQA

multimodal
+
+
+
+
About

RealWorldQA is a multimodal benchmark featuring 765 real-world images with questions and easily verifiable answers, designed to evaluate spatial understanding capabilities of vision-language models. This practical evaluation tests AI models' ability to comprehend real-world scenarios, understand spatial relationships, and answer questions about everyday situations captured in authentic photographic contexts.

+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

6 models
Top Score
77.8%
Average Score
69.1%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
2 models
74.0%
#2xAI
1 model
68.7%
#3DeepSeek
3 models
66.0%
+
+
+
+
Leaderboard
6 models ranked by performance on RealWorldQA
LicenseLinks
Aug 29, 2024
tongyi-qianwen
77.8%
Mar 27, 2025
Apache 2.0
70.3%
Apr 12, 2024
Proprietary
68.7%
Dec 13, 2024
deepseek
68.4%
Dec 13, 2024
deepseek
65.4%
Dec 13, 2024
deepseek
64.2%