Video-MME
Multilingual
multimodal
+
+
+
+
About
Video-MME is the first comprehensive multimodal evaluation benchmark for video analysis, featuring 900 manually annotated videos spanning 254 hours with 2,700 question-answer pairs. This full-spectrum evaluation tests AI models' video understanding capabilities across diverse video types, durations, and multimodal inputs including subtitles and audio, providing thorough assessment of video comprehension skills.
+
+
+
+
Evaluation Stats
Total Models5
Organizations2
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
5 models
Top Score
84.8%
Average Score
72.1%
High Performers (80%+)
1Top Organizations
#1Google
4 models
76.4%
#2Microsoft
1 model
55.0%
+
+
+
+
Leaderboard
5 models ranked by performance on Video-MME
License | Links | ||||
---|---|---|---|---|---|
May 20, 2025 | Proprietary | 84.8% | |||
May 1, 2024 | Proprietary | 78.6% | |||
May 1, 2024 | Proprietary | 76.1% | |||
Mar 15, 2024 | Proprietary | 66.2% | |||
Feb 1, 2025 | MIT | 55.0% |