Instruct HumanEval

text

About

Instruct-HumanEval is an instruction-based variant of the HumanEval benchmark that evaluates AI models' ability to generate code following natural language instructions. This benchmark tests models' capability to understand programming requirements expressed as instructions and translate them into functional code implementations, measuring instruction comprehension and code generation accuracy.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

73.8%

Average Score

73.8%

High Performers (80%+)

Top Organizations

#1NVIDIA

1 model

73.8%

Leaderboard

1 models ranked by performance on Instruct HumanEval

			License		Links
#01Llama 3.1 Nemotron 70B Instruct	NVIDIA	Oct 1, 2024	Llama 3.1 Community License	73.8%

Resources

Research Paper