VisualWebBench

multimodal

About

VisualWebBench is a comprehensive multimodal benchmark assessing AI models' understanding and grounding capabilities in web scenarios, featuring 1,500 human-curated instances from 139 real websites across 87 sub-domains. This evaluation tests seven web-related tasks including captioning, webpage QA, OCR, element grounding, and action prediction, measuring multimodal models' proficiency in web interface understanding.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

79.7%

Average Score

78.7%

High Performers (80%+)

Top Organizations

#1Amazon

2 models

78.7%

Leaderboard

2 models ranked by performance on VisualWebBench

			License		Links
#01Nova Pro	Amazon	Nov 20, 2024	Proprietary	79.7%
#02Nova Lite	Amazon	Nov 20, 2024	Proprietary	77.7%

Resources

Research Paper