MMLU
text
+
+
+
+
About
MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark covering 57 academic and professional domains including elementary mathematics, US history, computer science, law, and medicine. It evaluates language models' broad knowledge and reasoning capabilities across diverse subjects from elementary to advanced professional levels, serving as a standard measure of general language understanding and multidisciplinary knowledge.
+
+
+
+
Evaluation Stats
Total Models80
Organizations15
Verified Results0
Self-Reported79
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
80 models
Top Score
92.5%
Average Score
79.8%
High Performers (80%+)
48Top Organizations
#1Moonshot AI
5 models
88.9%
#2OpenAI
17 models
86.5%
#3xAI
3 models
85.0%
#4DeepSeek
2 models
84.5%
#5Anthropic
5 models
84.4%
+
+
+
+
Leaderboard
80 models ranked by performance on MMLU
License | Links | ||||
---|---|---|---|---|---|
Aug 7, 2025 | Proprietary | 92.5% | |||
Dec 17, 2024 | Proprietary | 91.8% | |||
Sep 12, 2024 | Proprietary | 90.8% | |||
Feb 27, 2025 | Proprietary | 90.8% | |||
Jun 21, 2024 | Proprietary | 90.4% | |||
Oct 22, 2024 | Proprietary | 90.4% | |||
Sep 5, 2025 | Proprietary | 90.2% | |||
Apr 14, 2025 | Proprietary | 90.2% | |||
Aug 5, 2025 | Apache 2.0 | 90.0% | |||
Jul 11, 2025 | MIT | 89.5% |
Showing 1 to 10 of 80 models
...