MME

multimodal
+
+
+
+
About

MME (Multimodal Large Language Model Evaluation) is a comprehensive benchmark measuring both perception and cognition abilities across 14 subtasks. It features manually designed instruction-answer pairs to avoid data leakage, evaluating 30+ advanced MLLMs on tasks ranging from basic visual recognition to complex reasoning. The benchmark reveals significant room for improvement in current multimodal models and provides quantitative analysis for model optimization.

+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
22.5%
Average Score
21.0%
High Performers (80%+)
0

Top Organizations

#1DeepSeek
3 models
21.0%
+
+
+
+
Leaderboard
3 models ranked by performance on MME
LicenseLinks
Dec 13, 2024
deepseek
22.5%
Dec 13, 2024
deepseek
21.2%
Dec 13, 2024
deepseek
19.1%
+
+
+
+
Resources