MMBench-V1.1

Multilingual

multimodal

About

MMBench v1.1 is an updated version of the MMBench benchmark featuring enhanced evaluation protocols and expanded question diversity for vision-language model assessment. This iteration incorporates improvements based on community feedback, refined evaluation metrics, and additional bilingual visual reasoning tasks to provide more comprehensive and accurate evaluation of multimodal capabilities.

Evaluation Stats

Total Models4

Organizations2

Verified Results0

Self-Reported4

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

4 models

Top Score

81.8%

Average Score

77.2%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

81.8%

#2DeepSeek

3 models

75.6%

Leaderboard

4 models ranked by performance on MMBench-V1.1

			License
#01Qwen2.5-Omni-7B	Alibaba Cloud / Qwen Team	Mar 27, 2025	Apache 2.0	81.8%
#02DeepSeek VL2 Small	DeepSeek	Dec 13, 2024	deepseek	79.3%
#03DeepSeek VL2	DeepSeek	Dec 13, 2024	deepseek	79.2%
#04DeepSeek VL2 Tiny	DeepSeek	Dec 13, 2024	deepseek	68.3%

Resources

Research Paper