MMT-Bench

multimodal

About

MMT-Bench is a comprehensive multimodal benchmark featuring 31,325 meticulously curated multi-choice visual questions designed to assess Large Vision-Language Models across massive multimodal tasks requiring expert knowledge. It covers 32 core meta-tasks and 162 subtasks spanning scenarios like vehicle driving and embodied navigation, evaluating visual recognition, localization, reasoning, and planning capabilities to advance general-purpose multimodal intelligence.

Evaluation Stats

Total Models4

Organizations2

Verified Results0

Self-Reported4

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

4 models

Top Score

63.6%

Average Score

60.8%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

63.6%

#2DeepSeek

3 models

59.9%

Leaderboard

4 models ranked by performance on MMT-Bench

			License
#01DeepSeek VL2	DeepSeek	Dec 13, 2024	deepseek	63.6%
#02Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	63.6%
#03DeepSeek VL2 Small	DeepSeek	Dec 13, 2024	deepseek	62.9%
#04DeepSeek VL2 Tiny	DeepSeek	Dec 13, 2024	deepseek	53.2%

Resources

Research Paper