LongBench v2

Multilingual

text

About

LongBench v2 is an advanced long-context evaluation benchmark designed to assess LLMs' ability to handle complex problems requiring deep understanding and reasoning across real-world multitasks. This benchmark tests models' sustained attention, information integration, and reasoning capabilities when processing extended contexts, measuring performance on tasks that demand comprehensive understanding of lengthy documents.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

48.7%

Average Score

48.7%

High Performers (80%+)

Top Organizations

#1DeepSeek

1 model

48.7%

Leaderboard

1 models ranked by performance on LongBench v2

			License		Links
#01DeepSeek-V3	DeepSeek	Dec 25, 2024	MIT + Model License (Commercial use allowed)	48.7%

Resources

Research Paper