HumanEval-Average
text
+
+
+
+
About
HumanEval Average is a variant of the HumanEval benchmark that provides aggregated performance metrics across multiple evaluation runs or different aspects of the original HumanEval coding tasks. This benchmark offers averaged performance measurements for more stable and reliable assessment of AI models' code generation capabilities, reducing variance in evaluation results.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
61.5%
Average Score
61.5%
High Performers (80%+)
0Top Organizations
#1Mistral AI
1 model
61.5%
+
+
+
+
Leaderboard
1 models ranked by performance on HumanEval-Average
License | Links | ||||
---|---|---|---|---|---|
May 29, 2024 | MNPL-0.1 | 61.5% |