BrowseComp

text
+
+
+
+
About

BrowseComp is a challenging benchmark measuring AI agents' ability to locate hard-to-find information through web browsing, featuring 1,266 problems with single, indisputable answers. Questions are designed to be difficult to find but easy to verify, testing information retrieval and browsing capabilities. The benchmark evaluates how well AI systems can navigate complex information landscapes to find specific, obscure details that require sophisticated search strategies.

+
+
+
+
Evaluation Stats
Total Models9
Organizations3
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
Sub-benchmarks3
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

9 models
Top Score
54.9%
Average Score
36.4%
High Performers (80%+)
0

Top Organizations

#1OpenAI
3 models
52.0%
#2Zhipu AI
3 models
30.9%
#3DeepSeek
3 models
26.3%
+
+
+
+
Leaderboard
9 models ranked by performance on BrowseComp
LicenseLinks
Aug 7, 2025
Proprietary
54.9%
Apr 16, 2025
Proprietary
51.5%
Apr 16, 2025
Proprietary
49.7%
Sep 30, 2025
MIT
45.1%
Sep 29, 2025
MIT
40.1%
Jan 10, 2025
MIT
30.0%
Jul 28, 2025
MIT
26.4%
Jul 28, 2025
MIT
21.3%
May 28, 2025
MIT
8.9%
+
+
+
+
Resources