BrowseComp Long Context 256k

text

About

BrowseComp Long Context 256k pushes the boundaries of context length evaluation by testing AI agents' browsing capabilities with extremely long context windows up to 256,000 tokens. This advanced variant measures how well models can handle massive amounts of textual information while maintaining the ability to locate specific, hard-to-find details. It represents one of the most demanding tests for long-context information retrieval and processing.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

88.8%

Average Score

88.8%

High Performers (80%+)

Top Organizations

#1OpenAI

1 model

88.8%

Leaderboard

1 models ranked by performance on BrowseComp Long Context 256k

			License		Links
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	88.8%

Resources

Research Paper