HumanEval-Mul

Multilingual
text
+
+
+
+
About

HumanEval-Mul is a multilingual extension of the HumanEval benchmark that evaluates AI models' code generation capabilities across multiple programming languages. This benchmark tests models' ability to generate functional code in various programming languages beyond Python, measuring cross-language programming competency and algorithmic thinking across diverse programming paradigms and syntax structures.

+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

2 models
Top Score
82.6%
Average Score
78.2%
High Performers (80%+)
1

Top Organizations

#1DeepSeek
2 models
78.2%
+
+
+
+
Leaderboard
2 models ranked by performance on HumanEval-Mul
LicenseLinks
Dec 25, 2024
MIT + Model License (Commercial use allowed)
82.6%
May 8, 2024
deepseek
73.8%
+
+
+
+
Resources