MMMU

Multimodal

About

MMMU evaluates multimodal models on 11.5K college-level questions across 30 subjects requiring visual perception and domain-specific knowledge in art, science, business, health, and engineering.

Evaluation Stats

Total Models16

Organizations9

Verified Results0

Self-Reported0

Benchmark Details

Max Score100

Performance Overview

Score distribution and top performers

Score Distribution

16 models

Top Score

78.4%

Average Score

30.1%

High Performers (80%+)

Top Organizations

#1Alibaba / Qwen

1 model

68.1%

#2ByteDance

1 model

67.6%

#3Kunlun Tech

1 model

55.4%

#4OpenAI

5 models

45.8%

#5Google DeepMind

2 models

33.5%

Leaderboard

16 models ranked by performance on MMMU

			License
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	78.4%
#02o3	OpenAI	Apr 16, 2025	Proprietary	76.4%
#03GPT-5.1	OpenAI	Nov 12, 2025	Proprietary	76.0%
#04Qwen3-VL-235B-A22B	Alibaba / Qwen	Sep 23, 2025	Apache 2.0	68.1%
#05Gemini 2.5 Pro	Google DeepMind	Mar 25, 2025	Proprietary	68.0%
#06Seed 1.5-VL	ByteDance	May 15, 2025	Proprietary	67.6%
#07Skywork-R1V3-38B	Kunlun Tech	Jul 9, 2025	Apache 2.0	55.4%
#08Claude Opus 4.5	Anthropic	Nov 24, 2025	Proprietary	-1.0%
#09o4 mini	OpenAI	Apr 16, 2025	Proprietary	-1.0%
#10Gemini 2.5 Flash	Google DeepMind	Apr 17, 2025	Proprietary	-1.0%

Showing 1 to 10 of 16 models

Additional Metrics

Extended metrics for top models on MMMU

Model	Score	MMMU-Pro
GPT-5	78.4	84.2%
o3	76.4	82.9%
GPT-5.1	76.0	85.4%
Qwen3-VL-235B-A22B	68.1	78.7%
Gemini 2.5 Pro	68.0	79.6%
Seed 1.5-VL	67.6	77.9%
Skywork-R1V3-38B	55.4	76%
Claude Opus 4.5	-1.0	80.7%
o4 mini	-1.0	81.6%
Gemini 2.5 Flash	-1.0	79.7%
o1	-1.0	78.2%
Grok 3	-1.0	78%
Claude Sonnet 4.5	-1.0	77.8%
InternS1	-1.0	77.7%
Llama 4 Behemoth	-1.0	76.1%
Claude Opus 4.1	-1.0	76.5%

Resources

Source Leaderboard