MMLU

text
+
+
+
+
About

MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark covering 57 academic and professional domains including elementary mathematics, US history, computer science, law, and medicine. It evaluates language models' broad knowledge and reasoning capabilities across diverse subjects from elementary to advanced professional levels, serving as a standard measure of general language understanding and multidisciplinary knowledge.

+
+
+
+
Evaluation Stats
Total Models80
Organizations15
Verified Results0
Self-Reported79
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

80 models
Top Score
92.5%
Average Score
79.8%
High Performers (80%+)
48

Top Organizations

#1Moonshot AI
5 models
88.9%
#2OpenAI
17 models
86.5%
#3xAI
3 models
85.0%
#4DeepSeek
2 models
84.5%
#5Anthropic
5 models
84.4%
+
+
+
+
Leaderboard
80 models ranked by performance on MMLU
LicenseLinks
Aug 7, 2025
Proprietary
92.5%
Dec 17, 2024
Proprietary
91.8%
Sep 12, 2024
Proprietary
90.8%
Feb 27, 2025
Proprietary
90.8%
Jun 21, 2024
Proprietary
90.4%
Oct 22, 2024
Proprietary
90.4%
Sep 5, 2025
Proprietary
90.2%
Apr 14, 2025
Proprietary
90.2%
Aug 5, 2025
Apache 2.0
90.0%
Jul 11, 2025
MIT
89.5%
Showing 1 to 10 of 80 models
...
+
+
+
+
Resources