Vibe-Eval

multimodal
+
+
+
+
About

VIBE-eval is a challenging multimodal evaluation benchmark featuring 269 visual understanding prompts, including 100 hard-difficulty questions with expert-authored gold-standard responses. This rigorous assessment tests frontier multimodal chat models' visual comprehension capabilities, with over 50% of hard questions typically answered incorrectly, highlighting significant gaps in current AI visual reasoning abilities.

+
+
+
+
Evaluation Stats
Total Models8
Organizations1
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
67.2%
Average Score
56.2%
High Performers (80%+)
0

Top Organizations

#1Google
8 models
56.2%
+
+
+
+
Leaderboard
8 models ranked by performance on Vibe-Eval
LicenseLinks
Jun 5, 2025
Proprietary
67.2%
May 20, 2025
Proprietary
65.6%
May 20, 2025
Proprietary
65.4%
Dec 1, 2024
Proprietary
56.3%
May 1, 2024
Proprietary
53.9%
Jun 17, 2025
Creative Commons Attribution 4.0 License
51.3%
May 1, 2024
Proprietary
48.9%
Mar 15, 2024
Proprietary
40.9%
+
+
+
+
Resources