MBPP EvalPlus

text
+
+
+
+
About

MBPP EvalPlus is an augmented version of the MBPP benchmark developed as part of the EvalPlus framework, featuring significantly expanded test coverage for more rigorous code evaluation. This enhanced benchmark tests AI models' Python programming capabilities through comprehensive test suites, providing more thorough assessment of code correctness and edge case handling than the original MBPP.

+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

2 models
Top Score
88.6%
Average Score
88.1%
High Performers (80%+)
2

Top Organizations

#1Meta
2 models
88.1%
+
+
+
+
Leaderboard
2 models ranked by performance on MBPP EvalPlus
LicenseLinks
Jul 23, 2024
Llama 3.1 Community License
88.6%
Dec 6, 2024
Llama 3.3 Community License Agreement
87.6%
+
+
+
+
Resources