MMLU-STEM
text
+
+
+
+
About
MMLU-STEM is a specialized subset of the Massive Multitask Language Understanding benchmark focusing exclusively on Science, Technology, Engineering, and Mathematics domains. It evaluates language models' technical knowledge and quantitative reasoning capabilities across STEM subjects, providing targeted assessment of scientific and mathematical understanding.
+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
2 models
Top Score
80.9%
Average Score
78.6%
High Performers (80%+)
1Top Organizations
#1Alibaba Cloud / Qwen Team
2 models
78.6%
+
+
+
+
Leaderboard
2 models ranked by performance on MMLU-STEM
License | Links | ||||
---|---|---|---|---|---|
Sep 19, 2024 | Apache 2.0 | 80.9% | |||
Sep 19, 2024 | Apache 2.0 | 76.4% |