BrowseComp Long Context 128k
text
+
+
+
+
About
BrowseComp Long Context 128k extends the original BrowseComp benchmark to evaluate AI agents' information retrieval capabilities with extended context windows up to 128,000 tokens. This variant tests how well models can maintain performance when processing longer documents and more complex information structures. It challenges agents to locate specific information within extensive textual content while maintaining accuracy and efficiency.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
90.0%
Average Score
90.0%
High Performers (80%+)
1Top Organizations
#1OpenAI
1 model
90.0%
+
+
+
+
Leaderboard
1 models ranked by performance on BrowseComp Long Context 128k