MultiPL-E
Multilingual
text
+
+
+
+
About
MultiPL-E is a comprehensive multilingual programming benchmark for evaluating code generation performance of large language models across multiple programming languages. It extends existing code benchmarks to cover diverse programming languages, testing models' ability to generate syntactically correct and functionally accurate code in various programming paradigms and language ecosystems.
+
+
+
+
Evaluation Stats
Total Models12
Organizations2
Verified Results0
Self-Reported12
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
12 models
Top Score
87.9%
Average Score
75.1%
High Performers (80%+)
4Top Organizations
#1Moonshot AI
2 models
85.7%
#2Alibaba Cloud / Qwen Team
10 models
72.9%
+
+
+
+
Leaderboard
12 models ranked by performance on MultiPL-E
License | Links | ||||
---|---|---|---|---|---|
Jul 22, 2025 | Apache 2.0 | 87.9% | |||
Sep 10, 2025 | Apache 2.0 | 87.8% | |||
Jul 11, 2025 | MIT | 85.7% | |||
Sep 5, 2025 | MIT | 85.7% | |||
Sep 19, 2024 | Apache 2.0 | 75.4% | |||
Sep 19, 2024 | Qwen | 75.1% | |||
Sep 19, 2024 | Apache 2.0 | 72.8% | |||
Sep 19, 2024 | Apache 2.0 | 70.4% | |||
Jul 23, 2024 | tongyi-qianwen | 69.2% | |||
Apr 29, 2025 | Apache 2.0 | 65.9% |
Showing 1 to 10 of 12 models