MME

multimodal

About

MME (Multimodal Large Language Model Evaluation) is a comprehensive benchmark measuring both perception and cognition abilities across 14 subtasks. It features manually designed instruction-answer pairs to avoid data leakage, evaluating 30+ advanced MLLMs on tasks ranging from basic visual recognition to complex reasoning. The benchmark reveals significant room for improvement in current multimodal models and provides quantitative analysis for model optimization.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

22.5%

Average Score

21.0%

High Performers (80%+)

Top Organizations

#1DeepSeek

3 models

21.0%

Leaderboard

3 models ranked by performance on MME

			License
#01DeepSeek VL2	DeepSeek	Dec 13, 2024	deepseek	22.5%
#02DeepSeek VL2 Small	DeepSeek	Dec 13, 2024	deepseek	21.2%
#03DeepSeek VL2 Tiny	DeepSeek	Dec 13, 2024	deepseek	19.1%

Resources

Research Paper