MMLU-Redux
text
+
+
+
+
About
MMLU-Redux is a refined version of the Massive Multitask Language Understanding benchmark that addresses issues in the original dataset through improved question curation and evaluation methodology. It aims to provide more accurate and reliable assessment of language models' knowledge and reasoning capabilities across academic domains.
+
+
+
+
Evaluation Stats
Total Models17
Organizations3
Verified Results0
Self-Reported17
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
17 models
Top Score
93.8%
Average Score
85.8%
High Performers (80%+)
13Top Organizations
#1Moonshot AI
2 models
92.7%
#2DeepSeek
3 models
91.4%
#3Alibaba Cloud / Qwen Team
12 models
83.2%
+
+
+
+
Leaderboard
17 models ranked by performance on MMLU-Redux
License | Links | ||||
---|---|---|---|---|---|
Jul 25, 2025 | Apache 2.0 | 93.8% | |||
May 28, 2025 | MIT | 93.4% | |||
Jul 22, 2025 | Apache 2.0 | 93.1% | |||
Jul 11, 2025 | MIT | 92.7% | |||
Sep 5, 2025 | MIT | 92.7% | |||
Sep 10, 2025 | Apache 2.0 | 92.5% | |||
Jan 10, 2025 | MIT | 91.8% | |||
Sep 10, 2025 | Apache 2.0 | 90.9% | |||
Dec 25, 2024 | MIT + Model License (Commercial use allowed) | 89.1% | |||
Apr 29, 2025 | Apache 2.0 | 87.4% |
Showing 1 to 10 of 17 models