MMStar

multimodal

About

MMStar is an elite vision-indispensable multimodal benchmark comprising 1,500 challenge samples meticulously selected by humans. It focuses on evaluating large vision-language models on tasks that absolutely require visual understanding, eliminating questions that can be answered through text alone. This curated benchmark provides more accurate assessment of models' true multimodal capabilities by ensuring vision-dependency in all evaluation tasks.

Evaluation Stats

Total Models7

Organizations2

Verified Results0

Self-Reported7

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

7 models

Top Score

70.8%

Average Score

61.8%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

4 models

67.1%

#2DeepSeek

3 models

54.7%

Leaderboard

7 models ranked by performance on MMStar

			License
#01Qwen2.5 VL 72B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	tongyi-qianwen	70.8%
#02Qwen2.5 VL 32B Instruct	Alibaba Cloud / Qwen Team	Feb 28, 2025	Apache 2.0	69.5%
#03Qwen2.5-Omni-7B	Alibaba Cloud / Qwen Team	Mar 27, 2025	Apache 2.0	64.0%
#04Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	63.9%
#05DeepSeek VL2	DeepSeek	Dec 13, 2024	deepseek	61.3%
#06DeepSeek VL2 Small	DeepSeek	Dec 13, 2024	deepseek	57.0%
#07DeepSeek VL2 Tiny	DeepSeek	Dec 13, 2024	deepseek	45.9%

Resources

Research Paper