Instruct HumanEval

text
+
+
+
+
About

Instruct-HumanEval is an instruction-based variant of the HumanEval benchmark that evaluates AI models' ability to generate code following natural language instructions. This benchmark tests models' capability to understand programming requirements expressed as instructions and translate them into functional code implementations, measuring instruction comprehension and code generation accuracy.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
73.8%
Average Score
73.8%
High Performers (80%+)
0

Top Organizations

#1NVIDIA
1 model
73.8%
+
+
+
+
Leaderboard
1 models ranked by performance on Instruct HumanEval
LicenseLinks
Oct 1, 2024
Llama 3.1 Community License
73.8%
+
+
+
+
Resources