- Home
- /
- Benchmarks
- /
- MBPP
MBPP
Coding
+
+
+
+
About
MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers, covering programming fundamentals and standard library functionality.
+
+
+
+
Evaluation Stats
Total Models20
Organizations6
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
20 models
Top Score
91.3%
Average Score
80.3%
High Performers (80%+)
11Top Organizations
#1NVIDIA
2 models
87.9%
#2Alibaba / Qwen
10 models
82.6%
#3Microsoft
1 model
80.8%
#4Meta AI
1 model
77.6%
#5Google DeepMind
3 models
74.5%
+
+
+
+
Leaderboard
20 models ranked by performance on MBPP
| License | Links | ||||
|---|---|---|---|---|---|
| Mar 1, 2025 | Apache 2.0 | 91.3% | |||
| Nov 12, 2024 | Apache 2.0 | 90.2% | |||
| Sep 19, 2024 | Apache 2.0 | 88.2% | |||
| Jan 6, 2025 | Apache 2.0 | 84.6% | |||
| Sep 19, 2024 | Apache 2.0 | 84.0% | |||
| Mar 1, 2025 | Apache 2.0 | 84.0% | |||
| Nov 12, 2024 | Apache 2.0 | 83.5% | |||
| Sep 19, 2024 | Apache 2.0 | 82.0% | |||
| Apr 28, 2025 | Apache 2.0 | 81.4% | |||
| Aug 22, 2024 | MIT | 80.8% |
Showing 1 to 10 of 20 models
+
+
+
+
Additional Metrics
Extended metrics for top models on MBPP
| Model | Score | Cost | Size | Context | License |
|---|---|---|---|---|---|
| Llama-3.3 Nemotron Super 49B | 91.3 | — | 50B | — | |
| Qwen2.5-Coder 32B Instruct | 90.2 | $0.09 $0.09 | 32B | 128K | |
| Qwen2.5 72B Instruct | 88.2 | $0.35 $0.40 | 73B | 131K | |
| Llama 3.1 Nemotron Nano 8B | 84.6 | — | 8B | — | |
| Qwen2.5 32B Instruct | 84.0 | — | 33B | — | |
| Qwen2.5-VL 32B Instruct | 84.0 | — | 34B | — | |
| Qwen2.5-Coder 7B Instruct | 83.5 | — | 7B | — | |
| Qwen2.5 14B Instruct | 82.0 | — | 15B | — | |
| Qwen3-235B-A22B | 81.4 | $0.10 $0.10 | 235B | 128K | |
| Phi-3.5-MoE Instruct | 80.8 | — | 60B | — | |
| Qwen2 72B Instruct | 80.2 | — | 72B | — | |
| Qwen2.5 7B Instruct | 79.2 | $0.30 $0.30 | 8B | 131K | |
| Codestral 22B | 78.2 | — | 22B | — | |
| Llama 4 Maverick | 77.6 | $0.17 $0.60 | 400B | 1.0M | |
| Gemini Diffusion | 76.0 | — | — | — | |
| Mistral Small 3.1 24B Instruct | 74.7 | — | 24B | — | |
| Gemma 3 27B | 74.4 | $0.10 $0.20 | 27B | 131K | |
| Qwen2.5-Omni-7B | 73.2 | — | 7B | — | |
| Gemma 3 12B | 73.0 | $0.05 $0.10 | 12B | 131K | |
| Mistral Small 3 24B | 69.6 | — | 24B | — |