HumanEval Plus

text

About

HumanEval-Plus is an enhanced version of the original HumanEval benchmark featuring additional test cases and more rigorous evaluation criteria for code generation tasks. This benchmark provides more comprehensive testing of AI models' programming capabilities through expanded test coverage, measuring code correctness and robustness with enhanced evaluation methodology for more reliable coding assessment.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

92.9%

Average Score

92.9%

High Performers (80%+)

Top Organizations

#1Mistral AI

1 model

92.9%

Leaderboard

1 models ranked by performance on HumanEval Plus

			License		Links
#01Mistral Small 3.2 24B Instruct	Mistral AI	Jun 20, 2025	Apache 2.0	92.9%

Resources

Research Paper