BIG-Bench

Multilingual

text

About

BIG-Bench (Beyond the Imitation Game Benchmark) is a collaborative evaluation suite featuring over 200 tasks designed to probe Large Language Models and extrapolate their future capabilities. This comprehensive benchmark covers diverse cognitive abilities including reasoning, knowledge, language understanding, and specialized skills. BIG-Bench serves as a foundational evaluation framework for assessing current AI capabilities and predicting performance scaling with model improvements.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

75.0%

Average Score

72.7%

High Performers (80%+)

Top Organizations

#1Google

3 models

72.7%

Leaderboard

3 models ranked by performance on BIG-Bench

			License
#01Gemini 1.0 Pro	Google	Feb 15, 2024	Proprietary	75.0%
#02Gemma 2 27B	Google	Jun 27, 2024	Gemma	74.9%
#03Gemma 2 9B	Google	Jun 27, 2024	Gemma	68.2%

Resources

Research Paper