BIG-Bench
Multilingual
text
+
+
+
+
About
BIG-Bench (Beyond the Imitation Game Benchmark) is a collaborative evaluation suite featuring over 200 tasks designed to probe Large Language Models and extrapolate their future capabilities. This comprehensive benchmark covers diverse cognitive abilities including reasoning, knowledge, language understanding, and specialized skills. BIG-Bench serves as a foundational evaluation framework for assessing current AI capabilities and predicting performance scaling with model improvements.
+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
75.0%
Average Score
72.7%
High Performers (80%+)
0Top Organizations
#1Google
3 models
72.7%
+
+
+
+
Leaderboard
3 models ranked by performance on BIG-Bench
License | Links | ||||
---|---|---|---|---|---|
Feb 15, 2024 | Proprietary | 75.0% | |||
Jun 27, 2024 | Gemma | 74.9% | |||
Jun 27, 2024 | Gemma | 68.2% |