VideoMME w sub.

multimodal

About

VideoMME w/ Sub is the subtitle-enhanced variant of the VideoMME benchmark that includes textual captions and subtitles to support video understanding tasks. This evaluation tests AI models' ability to integrate textual information with visual and audio content, assessing enhanced multimodal comprehension when subtitle assistance is available for video analysis and question answering.

Evaluation Stats

Total Models4

Organizations2

Verified Results0

Self-Reported4

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

4 models

Top Score

86.7%

Average Score

77.2%

High Performers (80%+)

Top Organizations

#1OpenAI

1 model

86.7%

#2Alibaba Cloud / Qwen Team

3 models

74.0%

Leaderboard

4 models ranked by performance on VideoMME w sub.

			License
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	86.7%
#02Qwen2.5 VL 32B Instruct	Alibaba Cloud / Qwen Team	Feb 28, 2025	Apache 2.0	77.9%
#03Qwen2.5-Omni-7B	Alibaba Cloud / Qwen Team	Mar 27, 2025	Apache 2.0	72.4%
#04Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	71.6%

Resources

Research Paper