Multi-SWE-Bench

Multilingual

text

About

A multilingual benchmark for issue resolving that evaluates Large Language Models' ability to resolve software issues across diverse programming ecosystems. Covers 7 programming languages (Java, TypeScript, JavaScript, Go, Rust, C, and C++) with 1,632 high-quality instances carefully annotated by 68 expert annotators. Addresses limitations of existing benchmarks that focus almost exclusively on Python.

Evaluation Stats

Total Models0

Organizations0

Verified Results0

Self-Reported0

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

No evaluation results available for this benchmark

Resources

Research Paper