Instruct HumanEval
text
+
+
+
+
About
Instruct-HumanEval is an instruction-based variant of the HumanEval benchmark that evaluates AI models' ability to generate code following natural language instructions. This benchmark tests models' capability to understand programming requirements expressed as instructions and translate them into functional code implementations, measuring instruction comprehension and code generation accuracy.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
73.8%
Average Score
73.8%
High Performers (80%+)
0Top Organizations
#1NVIDIA
1 model
73.8%
+
+
+
+
Leaderboard
1 models ranked by performance on Instruct HumanEval
License | Links | ||||
---|---|---|---|---|---|
Oct 1, 2024 | Llama 3.1 Community License | 73.8% |