MMLU-ProX

text
+
+
+
+
About

MMLU-Prox is a specialized variant of the Massive Multitask Language Understanding benchmark designed for proximity-based evaluation of language models. It focuses on assessing models' ability to handle similar or related questions within the same academic domains, testing consistency and robustness in knowledge application across closely related concepts.

+
+
+
+
Evaluation Stats
Total Models8
Organizations2
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
81.0%
Average Score
46.5%
High Performers (80%+)
1

Top Organizations

#1Alibaba Cloud / Qwen Team
4 models
79.0%
#2Google
4 models
14.0%
+
+
+
+
Leaderboard
8 models ranked by performance on MMLU-ProX
LicenseLinks
Jul 25, 2025
Apache 2.0
81.0%
Jul 22, 2025
Apache 2.0
79.4%
Sep 10, 2025
Apache 2.0
78.7%
Sep 10, 2025
Apache 2.0
76.7%
May 20, 2025
Gemma
19.9%
Jun 26, 2025
Proprietary
19.9%
May 20, 2025
Gemma
8.1%
Jun 26, 2025
Proprietary
8.1%
+
+
+
+
Resources