HLE with Tools

Reasoning
+
+
+
+
About

HLE with Tools is Humanity's Last Exam evaluated in an agentic setting with web search, code execution, and extended context up to 3M tokens, testing whether frontier models can solve expert-level multi-disciplinary academic questions when augmented with tool access.

+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

6 models
Top Score
53.0%
Average Score
45.8%
High Performers (80%+)
0

Top Organizations

#1OpenAI
1 model
50.0%
#2Google DeepMind
1 model
45.8%
#3Anthropic
4 models
44.8%
+
+
+
+
Leaderboard
6 models ranked by performance on HLE with Tools
LicenseLinks
Feb 1, 2026
Proprietary
53.0%
Dec 11, 2025
Proprietary
50.0%
Feb 17, 2026
Proprietary
49.0%
Nov 18, 2025
Proprietary
45.8%
Nov 1, 2025
Proprietary
43.4%
Sep 29, 2025
Proprietary
33.6%
+
+
+
+
Resources