MMLU-STEM

text

About

MMLU-STEM is a specialized subset of the Massive Multitask Language Understanding benchmark focusing exclusively on Science, Technology, Engineering, and Mathematics domains. It evaluates language models' technical knowledge and quantitative reasoning capabilities across STEM subjects, providing targeted assessment of scientific and mathematical understanding.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

80.9%

Average Score

78.6%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

2 models

78.6%

Leaderboard

2 models ranked by performance on MMLU-STEM

			License		Links
#01Qwen2.5 32B Instruct	Alibaba Cloud / Qwen Team	Sep 19, 2024	Apache 2.0	80.9%
#02Qwen2.5 14B Instruct	Alibaba Cloud / Qwen Team	Sep 19, 2024	Apache 2.0	76.4%

Resources

Research Paper