GPQA
text
+
+
+
+
About
GPQA (Graduate-Level Google-Proof Q&A) is a challenging AI benchmark featuring 448 PhD-level multiple-choice questions in physics, chemistry, and biology. Created by domain experts, these questions require deep scientific understanding and multi-step reasoning rather than simple factual recall. The "Google-proof" design ensures answers cannot be easily found through web searches, making it an effective test of genuine AI comprehension.
+
+
+
+
Evaluation Stats
Total Models100
Organizations13
Verified Results0
Self-Reported100
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
100 models
Top Score
88.4%
Average Score
65.2%
High Performers (80%+)
19Top Organizations
#1Zhipu AI
3 models
78.4%
#2xAI
7 models
76.7%
#3OpenAI
19 models
70.8%
#4Anthropic
10 models
69.6%
#5Moonshot AI
4 models
68.5%
+
+
+
+
Leaderboard
100 models ranked by performance on GPQA
| License | Links | ||||
|---|---|---|---|---|---|
| Jul 9, 2025 | Proprietary | 88.4% | |||
| Jul 9, 2025 | Proprietary | 87.5% | |||
| Jun 5, 2025 | Proprietary | 86.4% | |||
| Aug 28, 2025 | Proprietary | 85.7% | |||
| Aug 7, 2025 | Proprietary | 85.7% | |||
| Feb 24, 2025 | Proprietary | 84.8% | |||
| Feb 17, 2025 | Proprietary | 84.6% | |||
| Feb 17, 2025 | Proprietary | 84.0% | |||
| Sep 29, 2025 | Proprietary | 83.4% | |||
| Apr 16, 2025 | Proprietary | 83.3% |
Showing 1 to 10 of 100 models
...