- Home
- /
- Benchmarks
- /
- GDPVal
GDPVal
Agents
+
+
+
+
About
GDPVal evaluates AI models on well-specified professional tasks across finance, healthcare, government, and other GDP-contributing sectors, measuring readiness for real-world occupational work.
+
+
+
+
Evaluation Stats
Total Models11
Organizations4
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
11 models
Top Score
49.7%
Average Score
33.3%
High Performers (80%+)
0Top Organizations
#1Anthropic
3 models
43.9%
#2Google DeepMind
2 models
31.8%
#3OpenAI
5 models
30.1%
#4xAI
1 model
21.1%
+
+
+
+
Leaderboard
11 models ranked by performance on GDPVal
| License | Links | ||||
|---|---|---|---|---|---|
| Dec 11, 2025 | Proprietary | 49.7% | |||
| Nov 24, 2025 | Proprietary | 45.5% | |||
| Aug 5, 2025 | Proprietary | 43.6% | |||
| Sep 29, 2025 | Proprietary | 42.5% | |||
| Nov 18, 2025 | Proprietary | 40.3% | |||
| Aug 7, 2025 | Proprietary | 34.8% | |||
| Apr 16, 2025 | Proprietary | 30.8% | |||
| Apr 16, 2025 | Proprietary | 25.3% | |||
| Mar 25, 2025 | Proprietary | 23.3% | |||
| Jul 9, 2025 | Proprietary | 21.1% |
Showing 1 to 10 of 11 models
+
+
+
+
Additional Metrics
Extended metrics for top models on GDPVal
| Model | Score | Human Equiv. Rate |
|---|---|---|
| GPT-5.2 | 49.7 | 70.9% |
| Claude Opus 4.5 | 45.5 | 59.6% |
| Claude Opus 4.1 | 43.6 | 47.6% |
| Claude Sonnet 4.5 | 42.5 | 50.3% |
| Gemini 3 Pro | 40.3 | 53.5% |
| GPT-5 | 34.8 | 38% |
| o3 | 30.8 | 34.1% |
| o4 mini | 25.3 | 27.8% |
| Gemini 2.5 Pro | 23.3 | 25.5% |
| Grok 4 | 21.1 | 24.3% |
| GPT-4o | 9.9 | 12.3% |