- Home
- /
- Benchmarks
- /
- HLE with Tools
HLE with Tools
Reasoning
+
+
+
+
About
HLE with Tools is Humanity's Last Exam evaluated in an agentic setting with web search, code execution, and extended context up to 3M tokens, testing whether frontier models can solve expert-level multi-disciplinary academic questions when augmented with tool access.
+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
53.0%
Average Score
45.8%
High Performers (80%+)
0Top Organizations
#1OpenAI
1 model
50.0%
#2Google DeepMind
1 model
45.8%
#3Anthropic
4 models
44.8%
+
+
+
+
Leaderboard
6 models ranked by performance on HLE with Tools
| License | Links | ||||
|---|---|---|---|---|---|
| Feb 1, 2026 | Proprietary | 53.0% | |||
| Dec 11, 2025 | Proprietary | 50.0% | |||
| Feb 17, 2026 | Proprietary | 49.0% | |||
| Nov 18, 2025 | Proprietary | 45.8% | |||
| Nov 1, 2025 | Proprietary | 43.4% | |||
| Sep 29, 2025 | Proprietary | 33.6% |