HLE
multimodal
+
+
+
+
About
Humanity's Last Exam (HLE) is a multi-modal academic benchmark with 2,500 questions across mathematics, humanities, and natural sciences, designed to test LLM capabilities at the frontier of human knowledge with unambiguous, verifiable solutions
+
+
+
+
Evaluation Stats
Total Models5
Organizations3
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
5 models
Top Score
20.0%
Average Score
13.4%
High Performers (80%+)
0Top Organizations
#1xAI
1 model
20.0%
#2Zhipu AI
3 models
14.1%
#3Moonshot AI
1 model
4.7%
+
+
+
+
Leaderboard
5 models ranked by performance on HLE
| License | Links | ||||
|---|---|---|---|---|---|
| Aug 28, 2025 | Proprietary | 20.0% | |||
| Sep 30, 2025 | MIT | 17.2% | |||
| Jul 28, 2025 | MIT | 14.4% | |||
| Jul 28, 2025 | MIT | 10.6% | |||
| Sep 5, 2025 | MIT | 4.7% |