HumanEval-Average

text

About

HumanEval Average is a variant of the HumanEval benchmark that provides aggregated performance metrics across multiple evaluation runs or different aspects of the original HumanEval coding tasks. This benchmark offers averaged performance measurements for more stable and reliable assessment of AI models' code generation capabilities, reducing variance in evaluation results.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

61.5%

Average Score

61.5%

High Performers (80%+)

Top Organizations

#1Mistral AI

1 model

61.5%

Leaderboard

1 models ranked by performance on HumanEval-Average

			License		Links
#01Codestral-22B	Mistral AI	May 29, 2024	MNPL-0.1	61.5%

Resources

Research Paper