MMLU-Redux

text
+
+
+
+
About

MMLU-Redux is a refined version of the Massive Multitask Language Understanding benchmark that addresses issues in the original dataset through improved question curation and evaluation methodology. It aims to provide more accurate and reliable assessment of language models' knowledge and reasoning capabilities across academic domains.

+
+
+
+
Evaluation Stats
Total Models17
Organizations3
Verified Results0
Self-Reported17
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

17 models
Top Score
93.8%
Average Score
85.8%
High Performers (80%+)
13

Top Organizations

#1Moonshot AI
2 models
92.7%
#2DeepSeek
3 models
91.4%
#3Alibaba Cloud / Qwen Team
12 models
83.2%
+
+
+
+
Leaderboard
17 models ranked by performance on MMLU-Redux
LicenseLinks
Jul 25, 2025
Apache 2.0
93.8%
May 28, 2025
MIT
93.4%
Jul 22, 2025
Apache 2.0
93.1%
Jul 11, 2025
MIT
92.7%
Sep 5, 2025
MIT
92.7%
Sep 10, 2025
Apache 2.0
92.5%
Jan 10, 2025
MIT
91.8%
Sep 10, 2025
Apache 2.0
90.9%
Dec 25, 2024
MIT + Model License (Commercial use allowed)
89.1%
Apr 29, 2025
Apache 2.0
87.4%
Showing 1 to 10 of 17 models
+
+
+
+
Resources