Video-MME

Multilingual
multimodal
+
+
+
+
About

Video-MME is the first comprehensive multimodal evaluation benchmark for video analysis, featuring 900 manually annotated videos spanning 254 hours with 2,700 question-answer pairs. This full-spectrum evaluation tests AI models' video understanding capabilities across diverse video types, durations, and multimodal inputs including subtitles and audio, providing thorough assessment of video comprehension skills.

+
+
+
+
Evaluation Stats
Total Models5
Organizations2
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

5 models
Top Score
84.8%
Average Score
72.1%
High Performers (80%+)
1

Top Organizations

#1Google
4 models
76.4%
#2Microsoft
1 model
55.0%
+
+
+
+
Leaderboard
5 models ranked by performance on Video-MME
LicenseLinks
May 20, 2025
Proprietary
84.8%
May 1, 2024
Proprietary
78.6%
May 1, 2024
Proprietary
76.1%
Mar 15, 2024
Proprietary
66.2%
Feb 1, 2025
MIT
55.0%
+
+
+
+
Resources