Multipl-E HumanEval

Multilingual

text

About

MultiPL-E HumanEval is a variant of the MultiPL-E benchmark specifically based on the HumanEval dataset, extending it to multiple programming languages. It evaluates code generation models' ability to solve the same programming problems across different languages, testing cross-language code generation consistency and programming language understanding.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

75.2%

Average Score

63.8%

High Performers (80%+)

Top Organizations

#1Meta

3 models

63.8%

Leaderboard

3 models ranked by performance on Multipl-E HumanEval

			License
#01Llama 3.1 405B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	75.2%
#02Llama 3.1 70B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	65.5%
#03Llama 3.1 8B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	50.8%

Resources

Research Paper