SWE-bench Multilingual

Multilingual

text

About

SWE-bench-multilingual is a software engineering benchmark extending the original SWE-bench to cover multiple programming languages including Java, TypeScript, JavaScript, Go, Rust, C, and C++. This comprehensive evaluation tests AI models' ability to resolve real-world software issues across diverse programming ecosystems, challenging multilingual code understanding and debugging capabilities in authentic development scenarios.

Evaluation Stats

Total Models5

Organizations2

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

57.9%

Average Score

47.5%

High Performers (80%+)

Top Organizations

#1DeepSeek

3 models

47.6%

#2Moonshot AI

2 models

47.3%

Leaderboard

5 models ranked by performance on SWE-bench Multilingual

			License
#01DeepSeek-V3.2-Exp	DeepSeek	Sep 29, 2025	MIT	57.9%
#02DeepSeek-V3.1	DeepSeek	Jan 10, 2025	MIT	54.5%
#03Kimi K2 Instruct	Moonshot AI	Jul 11, 2025	MIT	47.3%
#04Kimi K2-Instruct-0905	Moonshot AI	Sep 5, 2025	MIT	47.3%
#05DeepSeek-R1-0528	DeepSeek	May 28, 2025	MIT	30.5%

Resources

Research Paper