SWE-Dev

text
+
+
+
+
About

SWE-bench development split consisting of 225 software engineering problems drawn from real GitHub issues across 12 popular Python repositories. Language models are given a codebase along with a description of an issue to be resolved and must edit the codebase to address the issue, often requiring understanding and coordinating changes across multiple functions, classes, and files.

+
+
+
+
Evaluation Stats
Total Models0
Organizations0
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

No evaluation results available for this benchmark

+
+
+
+
Resources