- Home
- /
- Benchmarks
- /
- MMMU
MMMU
Multimodal
+
+
+
+
About
MMMU evaluates multimodal models on 11.5K college-level questions across 30 subjects requiring visual perception and domain-specific knowledge in art, science, business, health, and engineering.
+
+
+
+
Evaluation Stats
Total Models16
Organizations9
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
16 models
Top Score
78.4%
Average Score
30.1%
High Performers (80%+)
0Top Organizations
#1Alibaba / Qwen
1 model
68.1%
#2ByteDance
1 model
67.6%
#3Kunlun Tech
1 model
55.4%
#4OpenAI
5 models
45.8%
#5Google DeepMind
2 models
33.5%
+
+
+
+
Leaderboard
16 models ranked by performance on MMMU
| License | Links | ||||
|---|---|---|---|---|---|
| Aug 7, 2025 | Proprietary | 78.4% | |||
| Apr 16, 2025 | Proprietary | 76.4% | |||
| Nov 12, 2025 | Proprietary | 76.0% | |||
| Sep 23, 2025 | Apache 2.0 | 68.1% | |||
| Mar 25, 2025 | Proprietary | 68.0% | |||
| May 15, 2025 | Proprietary | 67.6% | |||
| Jul 9, 2025 | Apache 2.0 | 55.4% | |||
| Nov 24, 2025 | Proprietary | -1.0% | |||
| Apr 16, 2025 | Proprietary | -1.0% | |||
| Apr 17, 2025 | Proprietary | -1.0% |
Showing 1 to 10 of 16 models
+
+
+
+
Additional Metrics
Extended metrics for top models on MMMU
| Model | Score | MMMU-Pro |
|---|---|---|
| GPT-5 | 78.4 | 84.2% |
| o3 | 76.4 | 82.9% |
| GPT-5.1 | 76.0 | 85.4% |
| Qwen3-VL-235B-A22B | 68.1 | 78.7% |
| Gemini 2.5 Pro | 68.0 | 79.6% |
| Seed 1.5-VL | 67.6 | 77.9% |
| Skywork-R1V3-38B | 55.4 | 76% |
| Claude Opus 4.5 | -1.0 | 80.7% |
| o4 mini | -1.0 | 81.6% |
| Gemini 2.5 Flash | -1.0 | 79.7% |
| o1 | -1.0 | 78.2% |
| Grok 3 | -1.0 | 78% |
| Claude Sonnet 4.5 | -1.0 | 77.8% |
| InternS1 | -1.0 | 77.7% |
| Llama 4 Behemoth | -1.0 | 76.1% |
| Claude Opus 4.1 | -1.0 | 76.5% |