Vibe-Eval
multimodal
+
+
+
+
About
VIBE-eval is a challenging multimodal evaluation benchmark featuring 269 visual understanding prompts, including 100 hard-difficulty questions with expert-authored gold-standard responses. This rigorous assessment tests frontier multimodal chat models' visual comprehension capabilities, with over 50% of hard questions typically answered incorrectly, highlighting significant gaps in current AI visual reasoning abilities.
+
+
+
+
Evaluation Stats
Total Models8
Organizations1
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
8 models
Top Score
67.2%
Average Score
56.2%
High Performers (80%+)
0Top Organizations
#1Google
8 models
56.2%
+
+
+
+
Leaderboard
8 models ranked by performance on Vibe-Eval
License | Links | ||||
---|---|---|---|---|---|
Jun 5, 2025 | Proprietary | 67.2% | |||
May 20, 2025 | Proprietary | 65.6% | |||
May 20, 2025 | Proprietary | 65.4% | |||
Dec 1, 2024 | Proprietary | 56.3% | |||
May 1, 2024 | Proprietary | 53.9% | |||
Jun 17, 2025 | Creative Commons Attribution 4.0 License | 51.3% | |||
May 1, 2024 | Proprietary | 48.9% | |||
Mar 15, 2024 | Proprietary | 40.9% |