GPQA
text
+
+
+
+
About
GPQA (Graduate-Level Google-Proof Q&A) is a challenging AI benchmark featuring 448 PhD-level multiple-choice questions in physics, chemistry, and biology. Created by domain experts, these questions require deep scientific understanding and multi-step reasoning rather than simple factual recall. The "Google-proof" design ensures answers cannot be easily found through web searches, making it an effective test of genuine AI comprehension.
+
+
+
+
Evaluation Stats
Total Models100
Organizations13
Verified Results0
Self-Reported100
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
100 models
Top Score
88.4%
Average Score
64.1%
High Performers (80%+)
18Top Organizations
#1Zhipu AI
3 models
78.4%
#2xAI
6 models
75.3%
#3OpenAI
20 models
69.3%
#4Moonshot AI
4 models
68.5%
#5Anthropic
10 models
66.3%
+
+
+
+
Leaderboard
100 models ranked by performance on GPQA
License | Links | ||||
---|---|---|---|---|---|
Jul 9, 2025 | Proprietary | 88.4% | |||
Jul 9, 2025 | Proprietary | 87.5% | |||
Jun 5, 2025 | Proprietary | 86.4% | |||
Aug 7, 2025 | Proprietary | 85.7% | |||
Feb 24, 2025 | Proprietary | 84.8% | |||
Feb 17, 2025 | Proprietary | 84.6% | |||
Feb 17, 2025 | Proprietary | 84.0% | |||
Sep 29, 2025 | Proprietary | 83.4% | |||
Apr 16, 2025 | Proprietary | 83.3% | |||
May 20, 2025 | Proprietary | 83.0% |
Showing 1 to 10 of 100 models
...