POPE

multimodal
+
+
+
+
About

POPE (Polling-based Object Probing Evaluation) is a benchmark specifically designed to evaluate object hallucination in large vision-language models. Using a polling-based query method, POPE systematically tests whether models accurately identify the presence or absence of objects in images, providing crucial insights into visual grounding accuracy and the tendency of multimodal models to hallucinate non-existent objects.

+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

2 models
Top Score
86.1%
Average Score
85.9%
High Performers (80%+)
2

Top Organizations

#1Microsoft
2 models
85.9%
+
+
+
+
Leaderboard
2 models ranked by performance on POPE
LicenseLinks
Aug 23, 2024
MIT
86.1%
Feb 1, 2025
MIT
85.6%
+
+
+
+
Resources