HLE

multimodal
+
+
+
+
About

Humanity's Last Exam (HLE) is a multi-modal academic benchmark with 2,500 questions across mathematics, humanities, and natural sciences, designed to test LLM capabilities at the frontier of human knowledge with unambiguous, verifiable solutions

+
+
+
+
Evaluation Stats
Total Models4
Organizations2
Verified Results0
Self-Reported4
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

4 models
Top Score
17.2%
Average Score
11.7%
High Performers (80%+)
0

Top Organizations

#1Zhipu AI
3 models
14.1%
#2Moonshot AI
1 model
4.7%
+
+
+
+
Leaderboard
4 models ranked by performance on HLE
LicenseLinks
Sep 30, 2025
MIT
17.2%
Jul 28, 2025
MIT
14.4%
Jul 28, 2025
MIT
10.6%
Sep 5, 2025
MIT
4.7%
+
+
+
+
Resources