- Home
- /
- Benchmarks
- /
- BrowseComp
BrowseComp
Agents
+
+
+
+
About
BrowseComp evaluates AI agent web browsing and information-seeking persistence on 1,266 tasks requiring navigation of the live internet to find entangled, hard-to-locate information.
+
+
+
+
Evaluation Stats
Total Models19
Organizations5
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
19 models
Top Score
74.7%
Average Score
10.2%
High Performers (80%+)
0Top Organizations
#1Anthropic
3 models
25.5%
#2Google DeepMind
3 models
23.2%
#3OpenAI
8 models
4.8%
#4DeepSeek
2 models
2.2%
#5xAI
3 models
1.3%
+
+
+
+
Leaderboard
19 models ranked by performance on BrowseComp
| License | Links | ||||
|---|---|---|---|---|---|
| Feb 17, 2026 | Proprietary | 74.7% | |||
| Nov 18, 2025 | Proprietary | 59.2% | |||
| Aug 7, 2025 | Proprietary | 20.1% | |||
| Mar 25, 2025 | Proprietary | 7.8% | |||
| Dec 5, 2024 | Proprietary | 6.3% | |||
| Apr 14, 2025 | Proprietary | 3.6% | |||
| Apr 16, 2025 | Proprietary | 3.3% | |||
| Jan 20, 2025 | MIT | 3.1% | |||
| Apr 17, 2025 | Proprietary | 2.5% | |||
| Feb 17, 2025 | Proprietary | 1.7% |
Showing 1 to 10 of 19 models
+
+
+
+
Additional Metrics
Extended metrics for top models on BrowseComp
| Model | Score |
|---|---|
| Gemini 3 Pro | 59.2 |
| GPT-5 | 20.1 |
| Gemini 2.5 Pro | 7.8 |
| o1 | 6.3 |
| GPT-4.1 | 3.6 |
| o4 mini | 3.3 |
| DeepSeek-R1 | 3.1 |
| Gemini 2.5 Flash | 2.5 |
| Grok 3 mini | 1.7 |
| GPT-OSS-120B | 1.6 |
| o3 mini | 1.5 |
| Grok 3 | 1.3 |
| DeepSeek-V3.1 | 1.3 |
| GPT-OSS-20B | 1.3 |
| Claude Opus 4.1 | 1.0 |
| Grok 2 | 1.0 |
| o1 mini | 0.9 |
| Claude 3.5 Sonnet | 0.9 |