ERQA

multimodal

About

ERQA (Embodied Reasoning Question Answering) is a multimodal benchmark that evaluates AI models' spatial reasoning and physical understanding capabilities through questions about embodied interactions. This benchmark tests models' ability to reason about 3D environments, spatial relationships, and physical consequences of actions, bridging the gap between language understanding and real-world physical reasoning for robotics applications.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

65.7%

Average Score

55.0%

High Performers (80%+)

Top Organizations

#1OpenAI

3 models

55.0%

Leaderboard

3 models ranked by performance on ERQA

			License
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	65.7%
#02o3	OpenAI	Apr 16, 2025	Proprietary	64.0%
#03GPT-4o	OpenAI	Aug 6, 2024	Proprietary	35.2%

Resources

Research Paper Implementation