BigCodeBench-Full

text

About

BigCodeBench-Full contains the complete set of 1,140 programming tasks from the BigCodeBench suite, providing comprehensive evaluation of Large Language Models' coding abilities across the full spectrum of difficulty and complexity. This variant offers the most extensive assessment of programming capabilities, covering diverse coding challenges from basic implementations to advanced algorithmic problems requiring sophisticated reasoning and function usage.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

49.6%

Average Score

49.6%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

49.6%

Leaderboard

1 models ranked by performance on BigCodeBench-Full

			License		Links
#01Qwen2.5-Coder 32B Instruct	Alibaba Cloud / Qwen Team	Sep 19, 2024	Apache 2.0	49.6%

Resources

Research Paper