BIG-Bench Extra Hard

text
+
+
+
+
About

BIG-Bench Extra Hard (BBEH) is an extremely challenging benchmark designed to push the boundaries of LLM reasoning evaluation beyond current capabilities. Featuring the most difficult tasks that consistently challenge state-of-the-art models, it tests advanced reasoning, complex problem-solving, and sophisticated cognitive abilities. BBEH serves as a frontier evaluation tool for measuring progress toward more capable AI systems that can handle the most demanding intellectual challenges.

+
+
+
+
Evaluation Stats
Total Models5
Organizations1
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

5 models
Top Score
19.3%
Average Score
13.8%
High Performers (80%+)
0

Top Organizations

#1Google
5 models
13.8%
+
+
+
+
Leaderboard
5 models ranked by performance on BIG-Bench Extra Hard
LicenseLinks
Mar 12, 2025
Gemma
19.3%
Mar 12, 2025
Gemma
16.3%
May 20, 2025
Proprietary
15.0%
Mar 12, 2025
Gemma
11.0%
Mar 12, 2025
Gemma
7.2%
+
+
+
+
Resources