HLE

multimodal

About

Humanity's Last Exam (HLE) is a multi-modal academic benchmark with 2,500 questions across mathematics, humanities, and natural sciences, designed to test LLM capabilities at the frontier of human knowledge with unambiguous, verifiable solutions

Evaluation Stats

Total Models5

Organizations3

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

20.0%

Average Score

13.4%

High Performers (80%+)

Top Organizations

#1xAI

1 model

20.0%

#2Zhipu AI

3 models

14.1%

#3Moonshot AI

1 model

4.7%

Leaderboard

5 models ranked by performance on HLE

			License
#01Grok 4 Fast	xAI	Aug 28, 2025	Proprietary	20.0%
#02GLM-4.6	Zhipu AI	Sep 30, 2025	MIT	17.2%
#03GLM-4.5	Zhipu AI	Jul 28, 2025	MIT	14.4%
#04GLM-4.5-Air	Zhipu AI	Jul 28, 2025	MIT	10.6%
#05Kimi K2-Instruct-0905	Moonshot AI	Sep 5, 2025	MIT	4.7%

Resources

Research Paper