BIG-Bench Extra Hard
text
+
+
+
+
About
BIG-Bench Extra Hard (BBEH) is an extremely challenging benchmark designed to push the boundaries of LLM reasoning evaluation beyond current capabilities. Featuring the most difficult tasks that consistently challenge state-of-the-art models, it tests advanced reasoning, complex problem-solving, and sophisticated cognitive abilities. BBEH serves as a frontier evaluation tool for measuring progress toward more capable AI systems that can handle the most demanding intellectual challenges.
+
+
+
+
Evaluation Stats
Total Models5
Organizations1
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
5 models
Top Score
19.3%
Average Score
13.8%
High Performers (80%+)
0Top Organizations
#1Google
5 models
13.8%
+
+
+
+
Leaderboard
5 models ranked by performance on BIG-Bench Extra Hard
License | Links | ||||
---|---|---|---|---|---|
Mar 12, 2025 | Gemma | 19.3% | |||
Mar 12, 2025 | Gemma | 16.3% | |||
May 20, 2025 | Proprietary | 15.0% | |||
Mar 12, 2025 | Gemma | 11.0% | |||
Mar 12, 2025 | Gemma | 7.2% |