HumanEval-Mul
Multilingual
text
+
+
+
+
About
HumanEval-Mul is a multilingual extension of the HumanEval benchmark that evaluates AI models' code generation capabilities across multiple programming languages. This benchmark tests models' ability to generate functional code in various programming languages beyond Python, measuring cross-language programming competency and algorithmic thinking across diverse programming paradigms and syntax structures.
+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
2 models
Top Score
82.6%
Average Score
78.2%
High Performers (80%+)
1Top Organizations
#1DeepSeek
2 models
78.2%
+
+
+
+
Leaderboard
2 models ranked by performance on HumanEval-Mul
License | Links | ||||
---|---|---|---|---|---|
Dec 25, 2024 | MIT + Model License (Commercial use allowed) | 82.6% | |||
May 8, 2024 | deepseek | 73.8% |