MBPP EvalPlus
text
+
+
+
+
About
MBPP EvalPlus is an augmented version of the MBPP benchmark developed as part of the EvalPlus framework, featuring significantly expanded test coverage for more rigorous code evaluation. This enhanced benchmark tests AI models' Python programming capabilities through comprehensive test suites, providing more thorough assessment of code correctness and edge case handling than the original MBPP.
+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
2 models
Top Score
88.6%
Average Score
88.1%
High Performers (80%+)
2Top Organizations
#1Meta
2 models
88.1%
+
+
+
+
Leaderboard
2 models ranked by performance on MBPP EvalPlus
License | Links | ||||
---|---|---|---|---|---|
Jul 23, 2024 | Llama 3.1 Community License | 88.6% | |||
Dec 6, 2024 | Llama 3.3 Community License Agreement | 87.6% |