GPQA

text
+
+
+
+
About

GPQA (Graduate-Level Google-Proof Q&A) is a challenging AI benchmark featuring 448 PhD-level multiple-choice questions in physics, chemistry, and biology. Created by domain experts, these questions require deep scientific understanding and multi-step reasoning rather than simple factual recall. The "Google-proof" design ensures answers cannot be easily found through web searches, making it an effective test of genuine AI comprehension.

+
+
+
+
Evaluation Stats
Total Models100
Organizations13
Verified Results0
Self-Reported100
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

100 models
Top Score
88.4%
Average Score
65.2%
High Performers (80%+)
19

Top Organizations

#1Zhipu AI
3 models
78.4%
#2xAI
7 models
76.7%
#3OpenAI
19 models
70.8%
#4Anthropic
10 models
69.6%
#5Moonshot AI
4 models
68.5%
+
+
+
+
Leaderboard
100 models ranked by performance on GPQA
LicenseLinks
Jul 9, 2025
Proprietary
88.4%
Jul 9, 2025
Proprietary
87.5%
Jun 5, 2025
Proprietary
86.4%
Aug 28, 2025
Proprietary
85.7%
Aug 7, 2025
Proprietary
85.7%
Feb 24, 2025
Proprietary
84.8%
Feb 17, 2025
Proprietary
84.6%
Feb 17, 2025
Proprietary
84.0%
Sep 29, 2025
Proprietary
83.4%
Apr 16, 2025
Proprietary
83.3%
Showing 1 to 10 of 100 models
...
+
+
+
+
Resources