MMLU-redux-2.0

text

About

MMLU-Redux 2.0 represents the second iteration of the refined Massive Multitask Language Understanding benchmark, incorporating additional improvements in question quality, evaluation metrics, and domain coverage. This version offers enhanced reliability and more comprehensive assessment of language models' multidisciplinary knowledge and reasoning abilities.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

90.2%

Average Score

90.2%

High Performers (80%+)

Top Organizations

#1Moonshot AI

1 model

90.2%

Leaderboard

1 models ranked by performance on MMLU-redux-2.0

			License		Links
#01Kimi K2 Base	Moonshot AI	Jul 11, 2025	MIT	90.2%

Resources

Research Paper