GPQA

text
+
+
+
+
About

GPQA (Graduate-Level Google-Proof Q&A) is a challenging AI benchmark featuring 448 PhD-level multiple-choice questions in physics, chemistry, and biology. Created by domain experts, these questions require deep scientific understanding and multi-step reasoning rather than simple factual recall. The "Google-proof" design ensures answers cannot be easily found through web searches, making it an effective test of genuine AI comprehension.

+
+
+
+
Evaluation Stats
Total Models100
Organizations13
Verified Results0
Self-Reported100
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

100 models
Top Score
88.4%
Average Score
64.1%
High Performers (80%+)
18

Top Organizations

#1Zhipu AI
3 models
78.4%
#2xAI
6 models
75.3%
#3OpenAI
20 models
69.3%
#4Moonshot AI
4 models
68.5%
#5Anthropic
10 models
66.3%
+
+
+
+
Leaderboard
100 models ranked by performance on GPQA
LicenseLinks
Jul 9, 2025
Proprietary
88.4%
Jul 9, 2025
Proprietary
87.5%
Jun 5, 2025
Proprietary
86.4%
Aug 7, 2025
Proprietary
85.7%
Feb 24, 2025
Proprietary
84.8%
Feb 17, 2025
Proprietary
84.6%
Feb 17, 2025
Proprietary
84.0%
Sep 29, 2025
Proprietary
83.4%
Apr 16, 2025
Proprietary
83.3%
May 20, 2025
Proprietary
83.0%
Showing 1 to 10 of 100 models
...
+
+
+
+
Resources