BIG-Bench

Multilingual
text
+
+
+
+
About

BIG-Bench (Beyond the Imitation Game Benchmark) is a collaborative evaluation suite featuring over 200 tasks designed to probe Large Language Models and extrapolate their future capabilities. This comprehensive benchmark covers diverse cognitive abilities including reasoning, knowledge, language understanding, and specialized skills. BIG-Bench serves as a foundational evaluation framework for assessing current AI capabilities and predicting performance scaling with model improvements.

+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
75.0%
Average Score
72.7%
High Performers (80%+)
0

Top Organizations

#1Google
3 models
72.7%
+
+
+
+
Leaderboard
3 models ranked by performance on BIG-Bench
LicenseLinks
Feb 15, 2024
Proprietary
75.0%
Jun 27, 2024
Gemma
74.9%
Jun 27, 2024
Gemma
68.2%
+
+
+
+
Resources