Video-MME

Multilingual

multimodal

About

Video-MME is the first comprehensive multimodal evaluation benchmark for video analysis, featuring 900 manually annotated videos spanning 254 hours with 2,700 question-answer pairs. This full-spectrum evaluation tests AI models' video understanding capabilities across diverse video types, durations, and multimodal inputs including subtitles and audio, providing thorough assessment of video comprehension skills.

Evaluation Stats

Total Models5

Organizations2

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

84.8%

Average Score

72.1%

High Performers (80%+)

Top Organizations

#1Google

4 models

76.4%

#2Microsoft

1 model

55.0%

Leaderboard

5 models ranked by performance on Video-MME

			License
#01Gemini 2.5 Pro	Google	May 20, 2025	Proprietary	84.8%
#02Gemini 1.5 Pro	Google	May 1, 2024	Proprietary	78.6%
#03Gemini 1.5 Flash	Google	May 1, 2024	Proprietary	76.1%
#04Gemini 1.5 Flash 8B	Google	Mar 15, 2024	Proprietary	66.2%
#05Phi-4-multimodal-instruct	Microsoft	Feb 1, 2025	MIT	55.0%

Resources

Research Paper