MMLU-Base

text

About

MMLU-Base represents the foundational version of the Massive Multitask Language Understanding benchmark, providing baseline evaluation across 57 academic and professional domains. It serves as the standard reference point for measuring language models' broad knowledge and reasoning capabilities, covering subjects from elementary mathematics to advanced professional fields like law and medicine.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

68.0%

Average Score

68.0%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

68.0%

Leaderboard

1 models ranked by performance on MMLU-Base

			License		Links
#01Qwen2.5-Coder 7B Instruct	Alibaba Cloud / Qwen Team	Sep 19, 2024	Apache 2.0	68.0%

Resources

Research Paper