HLE with Tools

Reasoning

About

HLE with Tools is Humanity's Last Exam evaluated in an agentic setting with web search, code execution, and extended context up to 3M tokens, testing whether frontier models can solve expert-level multi-disciplinary academic questions when augmented with tool access.

Evaluation Stats

Total Models6

Organizations3

Verified Results0

Self-Reported6

Benchmark Details

Max Score100

Performance Overview

Score distribution and top performers

Score Distribution

6 models

Top Score

53.0%

Average Score

45.8%

High Performers (80%+)

Top Organizations

#1OpenAI

1 model

50.0%

#2Google DeepMind

1 model

45.8%

#3Anthropic

4 models

44.8%

Leaderboard

6 models ranked by performance on HLE with Tools

			License
#01Claude Opus 4.6	Anthropic	Feb 1, 2026	Proprietary	53.0%
#02GPT-5.2	OpenAI	Dec 11, 2025	Proprietary	50.0%
#03Claude Sonnet 4.6	Anthropic	Feb 17, 2026	Proprietary	49.0%
#04Gemini 3 Pro	Google DeepMind	Nov 18, 2025	Proprietary	45.8%
#05Claude Opus 4.5	Anthropic	Nov 1, 2025	Proprietary	43.4%
#06Claude Sonnet 4.5	Anthropic	Sep 29, 2025	Proprietary	33.6%

Resources

Source Leaderboard