MBPP

Coding

About

MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers, covering programming fundamentals and standard library functionality.

Evaluation Stats

Total Models20

Organizations6

Verified Results0

Self-Reported0

Benchmark Details

Max Score100

Performance Overview

Score distribution and top performers

Score Distribution

20 models

Top Score

91.3%

Average Score

80.3%

High Performers (80%+)

Top Organizations

#1NVIDIA

2 models

87.9%

#2Alibaba / Qwen

10 models

82.6%

#3Microsoft

1 model

80.8%

#4Meta AI

1 model

77.6%

#5Google DeepMind

3 models

74.5%

Leaderboard

20 models ranked by performance on MBPP

			License
#01Llama-3.3 Nemotron Super 49B	NVIDIA	Mar 1, 2025	Apache 2.0	91.3%
#02Qwen2.5-Coder 32B Instruct	Alibaba / Qwen	Nov 12, 2024	Apache 2.0	90.2%
#03Qwen2.5 72B Instruct	Alibaba / Qwen	Sep 19, 2024	Apache 2.0	88.2%
#04Llama 3.1 Nemotron Nano 8B	NVIDIA	Jan 6, 2025	Apache 2.0	84.6%
#05Qwen2.5 32B Instruct	Alibaba / Qwen	Sep 19, 2024	Apache 2.0	84.0%
#06Qwen2.5-VL 32B Instruct	Alibaba / Qwen	Mar 1, 2025	Apache 2.0	84.0%
#07Qwen2.5-Coder 7B Instruct	Alibaba / Qwen	Nov 12, 2024	Apache 2.0	83.5%
#08Qwen2.5 14B Instruct	Alibaba / Qwen	Sep 19, 2024	Apache 2.0	82.0%
#09Qwen3-235B-A22B	Alibaba / Qwen	Apr 28, 2025	Apache 2.0	81.4%
#10Phi-3.5-MoE Instruct	Microsoft	Aug 22, 2024	MIT	80.8%

Showing 1 to 10 of 20 models

Additional Metrics

Extended metrics for top models on MBPP

Model	Score	Cost	Size	Context
Llama-3.3 Nemotron Super 49B	91.3	—	50B	—
Qwen2.5-Coder 32B Instruct	90.2	$0.09 $0.09	32B	128K
Qwen2.5 72B Instruct	88.2	$0.35 $0.40	73B	131K
Llama 3.1 Nemotron Nano 8B	84.6	—	8B	—
Qwen2.5 32B Instruct	84.0	—	33B	—
Qwen2.5-VL 32B Instruct	84.0	—	34B	—
Qwen2.5-Coder 7B Instruct	83.5	—	7B	—
Qwen2.5 14B Instruct	82.0	—	15B	—
Qwen3-235B-A22B	81.4	$0.10 $0.10	235B	128K
Phi-3.5-MoE Instruct	80.8	—	60B	—
Qwen2 72B Instruct	80.2	—	72B	—
Qwen2.5 7B Instruct	79.2	$0.30 $0.30	8B	131K
Codestral 22B	78.2	—	22B	—
Llama 4 Maverick	77.6	$0.17 $0.60	400B	1.0M
Gemini Diffusion	76.0	—	—	—
Mistral Small 3.1 24B Instruct	74.7	—	24B	—
Gemma 3 27B	74.4	$0.10 $0.20	27B	131K
Qwen2.5-Omni-7B	73.2	—	7B	—
Gemma 3 12B	73.0	$0.05 $0.10	12B	131K
Mistral Small 3 24B	69.6	—	24B	—