Multi-SWE-Bench

Multilingual
text
+
+
+
+
About

A multilingual benchmark for issue resolving that evaluates Large Language Models' ability to resolve software issues across diverse programming ecosystems. Covers 7 programming languages (Java, TypeScript, JavaScript, Go, Rust, C, and C++) with 1,632 high-quality instances carefully annotated by 68 expert annotators. Addresses limitations of existing benchmarks that focus almost exclusively on Python.

+
+
+
+
Evaluation Stats
Total Models0
Organizations0
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

No evaluation results available for this benchmark

+
+
+
+
Resources