MMLU-ProX
text
+
+
+
+
About
MMLU-Prox is a specialized variant of the Massive Multitask Language Understanding benchmark designed for proximity-based evaluation of language models. It focuses on assessing models' ability to handle similar or related questions within the same academic domains, testing consistency and robustness in knowledge application across closely related concepts.
+
+
+
+
Evaluation Stats
Total Models8
Organizations2
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
8 models
Top Score
81.0%
Average Score
46.5%
High Performers (80%+)
1Top Organizations
#1Alibaba Cloud / Qwen Team
4 models
79.0%
#2Google
4 models
14.0%
+
+
+
+
Leaderboard
8 models ranked by performance on MMLU-ProX
License | Links | ||||
---|---|---|---|---|---|
Jul 25, 2025 | Apache 2.0 | 81.0% | |||
Jul 22, 2025 | Apache 2.0 | 79.4% | |||
Sep 10, 2025 | Apache 2.0 | 78.7% | |||
Sep 10, 2025 | Apache 2.0 | 76.7% | |||
May 20, 2025 | Gemma | 19.9% | |||
Jun 26, 2025 | Proprietary | 19.9% | |||
May 20, 2025 | Gemma | 8.1% | |||
Jun 26, 2025 | Proprietary | 8.1% |