BrowseComp
text
+
+
+
+
About
BrowseComp is a challenging benchmark measuring AI agents' ability to locate hard-to-find information through web browsing, featuring 1,266 problems with single, indisputable answers. Questions are designed to be difficult to find but easy to verify, testing information retrieval and browsing capabilities. The benchmark evaluates how well AI systems can navigate complex information landscapes to find specific, obscure details that require sophisticated search strategies.
+
+
+
+
Evaluation Stats
Total Models9
Organizations3
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
Sub-benchmarks3
+
+
+
+
Sub-benchmarks
3 related benchmarks
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
9 models
Top Score
54.9%
Average Score
36.4%
High Performers (80%+)
0Top Organizations
#1OpenAI
3 models
52.0%
#2Zhipu AI
3 models
30.9%
#3DeepSeek
3 models
26.3%
+
+
+
+
Leaderboard
9 models ranked by performance on BrowseComp
License | Links | ||||
---|---|---|---|---|---|
Aug 7, 2025 | Proprietary | 54.9% | |||
Apr 16, 2025 | Proprietary | 51.5% | |||
Apr 16, 2025 | Proprietary | 49.7% | |||
Sep 30, 2025 | MIT | 45.1% | |||
Sep 29, 2025 | MIT | 40.1% | |||
Jan 10, 2025 | MIT | 30.0% | |||
Jul 28, 2025 | MIT | 26.4% | |||
Jul 28, 2025 | MIT | 21.3% | |||
May 28, 2025 | MIT | 8.9% |