HumanEval Plus
text
+
+
+
+
About
HumanEval-Plus is an enhanced version of the original HumanEval benchmark featuring additional test cases and more rigorous evaluation criteria for code generation tasks. This benchmark provides more comprehensive testing of AI models' programming capabilities through expanded test coverage, measuring code correctness and robustness with enhanced evaluation methodology for more reliable coding assessment.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
92.9%
Average Score
92.9%
High Performers (80%+)
1Top Organizations
#1Mistral AI
1 model
92.9%
+
+
+
+
Leaderboard
1 models ranked by performance on HumanEval Plus
License | Links | ||||
---|---|---|---|---|---|
Jun 20, 2025 | Apache 2.0 | 92.9% |