BrowseComp Long Context 256k
text
+
+
+
+
About
BrowseComp Long Context 256k pushes the boundaries of context length evaluation by testing AI agents' browsing capabilities with extremely long context windows up to 256,000 tokens. This advanced variant measures how well models can handle massive amounts of textual information while maintaining the ability to locate specific, hard-to-find details. It represents one of the most demanding tests for long-context information retrieval and processing.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
88.8%
Average Score
88.8%
High Performers (80%+)
1Top Organizations
#1OpenAI
1 model
88.8%
+
+
+
+
Leaderboard
1 models ranked by performance on BrowseComp Long Context 256k