MMLU Chat

text
+
+
+
+
About

MMLU-Chat adapts the Massive Multitask Language Understanding benchmark for conversational AI evaluation, testing language models' ability to apply broad academic knowledge in chat-based interactions. This variant assesses how well models can engage in educational discussions and provide accurate information across 57 domains in a conversational format.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
80.6%
Average Score
80.6%
High Performers (80%+)
1

Top Organizations

#1NVIDIA
1 model
80.6%
+
+
+
+
Leaderboard
1 models ranked by performance on MMLU Chat
LicenseLinks
Oct 1, 2024
Llama 3.1 Community License
80.6%
+
+
+
+
Resources