MMLU

text

About

MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark covering 57 academic and professional domains including elementary mathematics, US history, computer science, law, and medicine. It evaluates language models' broad knowledge and reasoning capabilities across diverse subjects from elementary to advanced professional levels, serving as a standard measure of general language understanding and multidisciplinary knowledge.

Evaluation Stats

Total Models80

Organizations15

Verified Results0

Self-Reported79

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

80 models

Top Score

92.5%

Average Score

79.8%

High Performers (80%+)

Top Organizations

#1Moonshot AI

5 models

88.9%

#2OpenAI

17 models

86.5%

#3xAI

3 models

85.0%

#4DeepSeek

2 models

84.5%

#5Anthropic

5 models

84.4%

Leaderboard

80 models ranked by performance on MMLU

			License
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	92.5%
#02o1	OpenAI	Dec 17, 2024	Proprietary	91.8%
#03o1-preview	OpenAI	Sep 12, 2024	Proprietary	90.8%
#04GPT-4.5	OpenAI	Feb 27, 2025	Proprietary	90.8%
#05Claude 3.5 Sonnet	Anthropic	Jun 21, 2024	Proprietary	90.4%
#06Claude 3.5 Sonnet	Anthropic	Oct 22, 2024	Proprietary	90.4%
#07Kimi K2 0905	Moonshot AI	Sep 5, 2025	Proprietary	90.2%
#08GPT-4.1	OpenAI	Apr 14, 2025	Proprietary	90.2%
#09GPT OSS 120B	OpenAI	Aug 5, 2025	Apache 2.0	90.0%
#10Kimi K2 Instruct	Moonshot AI	Jul 11, 2025	MIT	89.5%

Showing 1 to 10 of 80 models

...

Resources

Research Paper