MMLU Chat

text

About

MMLU-Chat adapts the Massive Multitask Language Understanding benchmark for conversational AI evaluation, testing language models' ability to apply broad academic knowledge in chat-based interactions. This variant assesses how well models can engage in educational discussions and provide accurate information across 57 domains in a conversational format.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

80.6%

Average Score

80.6%

High Performers (80%+)

Top Organizations

#1NVIDIA

1 model

80.6%

Leaderboard

1 models ranked by performance on MMLU Chat

			License		Links
#01Llama 3.1 Nemotron 70B Instruct	NVIDIA	Oct 1, 2024	Llama 3.1 Community License	80.6%

Resources

Research Paper