BIG-Bench Extra Hard

text

About

BIG-Bench Extra Hard (BBEH) is an extremely challenging benchmark designed to push the boundaries of LLM reasoning evaluation beyond current capabilities. Featuring the most difficult tasks that consistently challenge state-of-the-art models, it tests advanced reasoning, complex problem-solving, and sophisticated cognitive abilities. BBEH serves as a frontier evaluation tool for measuring progress toward more capable AI systems that can handle the most demanding intellectual challenges.

Evaluation Stats

Total Models5

Organizations1

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

19.3%

Average Score

13.8%

High Performers (80%+)

Top Organizations

#1Google

5 models

13.8%

Leaderboard

5 models ranked by performance on BIG-Bench Extra Hard

			License
#01Gemma 3 27B	Google	Mar 12, 2025	Gemma	19.3%
#02Gemma 3 12B	Google	Mar 12, 2025	Gemma	16.3%
#03Gemini Diffusion	Google	May 20, 2025	Proprietary	15.0%
#04Gemma 3 4B	Google	Mar 12, 2025	Gemma	11.0%
#05Gemma 3 1B	Google	Mar 12, 2025	Gemma	7.2%

Resources

Research Paper