RealWorldQA
multimodal
+
+
+
+
About
RealWorldQA is a multimodal benchmark featuring 765 real-world images with questions and easily verifiable answers, designed to evaluate spatial understanding capabilities of vision-language models. This practical evaluation tests AI models' ability to comprehend real-world scenarios, understand spatial relationships, and answer questions about everyday situations captured in authentic photographic contexts.
+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
77.8%
Average Score
69.1%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
2 models
74.0%
#2xAI
1 model
68.7%
#3DeepSeek
3 models
66.0%
+
+
+
+
Leaderboard
6 models ranked by performance on RealWorldQA
License | Links | ||||
---|---|---|---|---|---|
Aug 29, 2024 | tongyi-qianwen | 77.8% | |||
Mar 27, 2025 | Apache 2.0 | 70.3% | |||
Apr 12, 2024 | Proprietary | 68.7% | |||
Dec 13, 2024 | deepseek | 68.4% | |||
Dec 13, 2024 | deepseek | 65.4% | |||
Dec 13, 2024 | deepseek | 64.2% |