MMLU-Pro

text

About

MMLU-Pro is an enhanced version of MMLU featuring more challenging reasoning-focused questions with expanded choice sets from four to ten options. It eliminates trivial questions from the original MMLU and demonstrates greater stability under varying prompts. The benchmark causes a 16-33% accuracy drop compared to standard MMLU, better revealing differences in model capabilities and requiring chain-of-thought reasoning for optimal performance.

Evaluation Stats

Total Models68

Organizations12

Verified Results0

Self-Reported68

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

68 models

Top Score

85.0%

Average Score

65.6%

High Performers (80%+)

Top Organizations

#1Zhipu AI

2 models

83.0%

#2DeepSeek

5 models

82.2%

#3Moonshot AI

4 models

78.5%

#4OpenAI

2 models

73.6%

#5Anthropic

5 models

68.8%

Leaderboard

68 models ranked by performance on MMLU-Pro

			License
#01DeepSeek-V3.2-Exp	DeepSeek	Sep 29, 2025	MIT	85.0%
#02DeepSeek-R1-0528	DeepSeek	May 28, 2025	MIT	85.0%
#03GLM-4.5	Zhipu AI	Jul 28, 2025	MIT	84.6%
#04Qwen3-235B-A22B-Thinking-2507	Alibaba Cloud / Qwen Team	Jul 25, 2025	Apache 2.0	84.4%
#05DeepSeek-V3.1	DeepSeek	Jan 10, 2025	MIT	83.7%
#06Qwen3-235B-A22B-Instruct-2507	Alibaba Cloud / Qwen Team	Jul 22, 2025	Apache 2.0	83.0%
#07Qwen3-Next-80B-A3B-Thinking	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	82.7%
#08Kimi K2 0905	Moonshot AI	Sep 5, 2025	Proprietary	82.5%
#09GLM-4.5-Air	Zhipu AI	Jul 28, 2025	MIT	81.4%
#10DeepSeek-V3 0324	DeepSeek	Mar 25, 2025	MIT + Model License (Commercial use allowed)	81.2%

Showing 1 to 10 of 68 models

...

Resources

Research Paper