BrowseComp

Agents

About

BrowseComp evaluates AI agent web browsing and information-seeking persistence on 1,266 tasks requiring navigation of the live internet to find entangled, hard-to-locate information.

Evaluation Stats

Total Models19

Organizations5

Verified Results0

Self-Reported0

Benchmark Details

Max Score100

Performance Overview

Score distribution and top performers

Score Distribution

19 models

Top Score

74.7%

Average Score

10.2%

High Performers (80%+)

Top Organizations

#1Anthropic

3 models

25.5%

#2Google DeepMind

3 models

23.2%

#3OpenAI

8 models

4.8%

#4DeepSeek

2 models

2.2%

#5xAI

3 models

1.3%

Leaderboard

19 models ranked by performance on BrowseComp

			License
#01Claude Sonnet 4.6 New	Anthropic	Feb 17, 2026	Proprietary	74.7%
#02Gemini 3 Pro	Google DeepMind	Nov 18, 2025	Proprietary	59.2%
#03GPT-5	OpenAI	Aug 7, 2025	Proprietary	20.1%
#04Gemini 2.5 Pro	Google DeepMind	Mar 25, 2025	Proprietary	7.8%
#05o1	OpenAI	Dec 5, 2024	Proprietary	6.3%
#06GPT-4.1	OpenAI	Apr 14, 2025	Proprietary	3.6%
#07o4 mini	OpenAI	Apr 16, 2025	Proprietary	3.3%
#08DeepSeek-R1	DeepSeek	Jan 20, 2025	MIT	3.1%
#09Gemini 2.5 Flash	Google DeepMind	Apr 17, 2025	Proprietary	2.5%
#10Grok 3 mini	xAI	Feb 17, 2025	Proprietary	1.7%

Showing 1 to 10 of 19 models

Additional Metrics

Extended metrics for top models on BrowseComp

Model	Score
Gemini 3 Pro	59.2
GPT-5	20.1
Gemini 2.5 Pro	7.8
o1	6.3
GPT-4.1	3.6
o4 mini	3.3
DeepSeek-R1	3.1
Gemini 2.5 Flash	2.5
Grok 3 mini	1.7
GPT-OSS-120B	1.6
o3 mini	1.5
Grok 3	1.3
DeepSeek-V3.1	1.3
GPT-OSS-20B	1.3
Claude Opus 4.1	1.0
Grok 2	1.0
o1 mini	0.9
Claude 3.5 Sonnet	0.9