BrowseComp Long Context 128k

text
+
+
+
+
About

BrowseComp Long Context 128k extends the original BrowseComp benchmark to evaluate AI agents' information retrieval capabilities with extended context windows up to 128,000 tokens. This variant tests how well models can maintain performance when processing longer documents and more complex information structures. It challenges agents to locate specific information within extensive textual content while maintaining accuracy and efficiency.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
90.0%
Average Score
90.0%
High Performers (80%+)
1

Top Organizations

#1OpenAI
1 model
90.0%
+
+
+
+
Leaderboard
1 models ranked by performance on BrowseComp Long Context 128k
LicenseLinks
Aug 7, 2025
Proprietary
90.0%
+
+
+
+
Resources