BrowseComp Long Context 128k

text

About

BrowseComp Long Context 128k extends the original BrowseComp benchmark to evaluate AI agents' information retrieval capabilities with extended context windows up to 128,000 tokens. This variant tests how well models can maintain performance when processing longer documents and more complex information structures. It challenges agents to locate specific information within extensive textual content while maintaining accuracy and efficiency.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

90.0%

Average Score

90.0%

High Performers (80%+)

Top Organizations

#1OpenAI

1 model

90.0%

Leaderboard

1 models ranked by performance on BrowseComp Long Context 128k

			License		Links
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	90.0%

Resources

Research Paper