MMLU-STEM

text
+
+
+
+
About

MMLU-STEM is a specialized subset of the Massive Multitask Language Understanding benchmark focusing exclusively on Science, Technology, Engineering, and Mathematics domains. It evaluates language models' technical knowledge and quantitative reasoning capabilities across STEM subjects, providing targeted assessment of scientific and mathematical understanding.

+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

2 models
Top Score
80.9%
Average Score
78.6%
High Performers (80%+)
1

Top Organizations

#1Alibaba Cloud / Qwen Team
2 models
78.6%
+
+
+
+
Leaderboard
2 models ranked by performance on MMLU-STEM
LicenseLinks
Sep 19, 2024
Apache 2.0
80.9%
Sep 19, 2024
Apache 2.0
76.4%
+
+
+
+
Resources