Multipl-E MBPP

Multilingual

text

About

MultiPL-E MBPP extends the Mostly Basic Python Problems (MBPP) benchmark to multiple programming languages through the MultiPL-E framework. This variant evaluates models' ability to solve basic programming problems across diverse languages, testing fundamental coding skills, algorithm implementation, and programming language syntax understanding.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

65.7%

Average Score

60.0%

High Performers (80%+)

Top Organizations

#1Meta

3 models

60.0%

Leaderboard

3 models ranked by performance on Multipl-E MBPP

			License
#01Llama 3.1 405B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	65.7%
#02Llama 3.1 70B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	62.0%
#03Llama 3.1 8B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	52.4%

Resources

Research Paper