ScreenSpot

multimodal

About

ScreenSpot is a GUI grounding benchmark that evaluates AI models' ability to locate and identify specific interface elements within screenshots of computer applications. This foundational evaluation tests visual understanding of graphical user interfaces, spatial reasoning for element localization, and the capacity to understand UI components across different software applications and operating systems.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

88.5%

Average Score

86.8%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

3 models

86.8%

Leaderboard

3 models ranked by performance on ScreenSpot

			License
#01Qwen2.5 VL 32B Instruct	Alibaba Cloud / Qwen Team	Feb 28, 2025	Apache 2.0	88.5%
#02Qwen2.5 VL 72B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	tongyi-qianwen	87.1%
#03Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	84.7%

Resources

Research Paper