BigCodeBench-Full

text
+
+
+
+
About

BigCodeBench-Full contains the complete set of 1,140 programming tasks from the BigCodeBench suite, providing comprehensive evaluation of Large Language Models' coding abilities across the full spectrum of difficulty and complexity. This variant offers the most extensive assessment of programming capabilities, covering diverse coding challenges from basic implementations to advanced algorithmic problems requiring sophisticated reasoning and function usage.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
49.6%
Average Score
49.6%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
1 model
49.6%
+
+
+
+
Leaderboard
1 models ranked by performance on BigCodeBench-Full
LicenseLinks
Sep 19, 2024
Apache 2.0
49.6%
+
+
+
+
Resources