RealWorldQA

multimodal

About

RealWorldQA is a multimodal benchmark featuring 765 real-world images with questions and easily verifiable answers, designed to evaluate spatial understanding capabilities of vision-language models. This practical evaluation tests AI models' ability to comprehend real-world scenarios, understand spatial relationships, and answer questions about everyday situations captured in authentic photographic contexts.

Evaluation Stats

Total Models6

Organizations3

Verified Results0

Self-Reported6

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

6 models

Top Score

77.8%

Average Score

69.1%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

2 models

74.0%

#2xAI

1 model

68.7%

#3DeepSeek

3 models

66.0%

Leaderboard

6 models ranked by performance on RealWorldQA

			License
#01Qwen2-VL-72B-Instruct	Alibaba Cloud / Qwen Team	Aug 29, 2024	tongyi-qianwen	77.8%
#02Qwen2.5-Omni-7B	Alibaba Cloud / Qwen Team	Mar 27, 2025	Apache 2.0	70.3%
#03Grok-1.5V	xAI	Apr 12, 2024	Proprietary	68.7%
#04DeepSeek VL2	DeepSeek	Dec 13, 2024	deepseek	68.4%
#05DeepSeek VL2 Small	DeepSeek	Dec 13, 2024	deepseek	65.4%
#06DeepSeek VL2 Tiny	DeepSeek	Dec 13, 2024	deepseek	64.2%