GPQA

text

About

GPQA (Graduate-Level Google-Proof Q&A) is a challenging AI benchmark featuring 448 PhD-level multiple-choice questions in physics, chemistry, and biology. Created by domain experts, these questions require deep scientific understanding and multi-step reasoning rather than simple factual recall. The "Google-proof" design ensures answers cannot be easily found through web searches, making it an effective test of genuine AI comprehension.

Evaluation Stats

Total Models100

Organizations13

Verified Results0

Self-Reported100

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

100 models

Top Score

88.4%

Average Score

65.2%

High Performers (80%+)

Top Organizations

#1Zhipu AI

3 models

78.4%

#2xAI

7 models

76.7%

#3OpenAI

19 models

70.8%

#4Anthropic

10 models

69.6%

#5Moonshot AI

4 models

68.5%

Leaderboard

100 models ranked by performance on GPQA

			License
#01Grok-4 Heavy	xAI	Jul 9, 2025	Proprietary	88.4%
#02Grok-4	xAI	Jul 9, 2025	Proprietary	87.5%
#03Gemini 2.5 Pro Preview 06-05	Google	Jun 5, 2025	Proprietary	86.4%
#04Grok 4 Fast	xAI	Aug 28, 2025	Proprietary	85.7%
#05GPT-5	OpenAI	Aug 7, 2025	Proprietary	85.7%
#06Claude 3.7 Sonnet	Anthropic	Feb 24, 2025	Proprietary	84.8%
#07Grok-3	xAI	Feb 17, 2025	Proprietary	84.6%
#08Grok-3 Mini	xAI	Feb 17, 2025	Proprietary	84.0%
#09Claude Sonnet 4.5	Anthropic	Sep 29, 2025	Proprietary	83.4%
#10o3	OpenAI	Apr 16, 2025	Proprietary	83.3%

Showing 1 to 10 of 100 models

...

Resources

Research Paper