- Home
- /
- Benchmarks
- /
- SWE-rebench
SWE-rebench
Coding
+
+
+
+
About
SWE-rebench evaluates LLM coding agents on real-world software engineering tasks with resolved rate as the primary metric.
+
+
+
+
Evaluation Stats
Total Models18
Organizations9
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
18 models
Top Score
51.7%
Average Score
43.7%
High Performers (80%+)
0Top Organizations
#1Anthropic
3 models
47.5%
#2Google DeepMind
2 models
46.7%
#3OpenAI
5 models
46.3%
#4Zhipu AI
2 models
41.7%
#5Moonshot AI
2 models
40.8%
+
+
+
+
Leaderboard
18 models ranked by performance on SWE-rebench
| License | Links | ||||
|---|---|---|---|---|---|
| Feb 5, 2026 | Proprietary | 51.7% | |||
| Dec 11, 2025 | Proprietary | 51.0% | |||
| Nov 19, 2025 | Proprietary | 48.5% | |||
| Sep 29, 2025 | Proprietary | 47.1% | |||
| Dec 17, 2025 | Proprietary | 46.7% | |||
| Nov 18, 2025 | Proprietary | 46.7% | |||
| Jan 14, 2026 | Proprietary | 45.0% | |||
| Sep 23, 2025 | Proprietary | 44.0% | |||
| Nov 6, 2025 | Modified MIT | 43.8% | |||
| Nov 24, 2025 | Proprietary | 43.8% |
Showing 1 to 10 of 18 models
+
+
+
+
Additional Metrics
Extended metrics for top models on SWE-rebench
| Model | Score | SEM | Cost | Tokens | Pass@5 | Cached Tokens |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | 51.7 | 0.42% | $0.93 | 1031373 | 58.3% | 94.3% |
| GPT-5.2 | 51.0 | 1.04% | $0.76 | 981139 | 60.4% | 68.3% |
| GPT-5.1 Codex Max | 48.5 | 1.13% | $0.73 | 1239950 | 56.3% | 67.1% |
| Claude Sonnet 4.5 | 47.1 | 1.69% | $0.94 | 1924648 | 60.4% | 96.4% |
| Gemini 3 Flash | 46.7 | 1.41% | $0.32 | 2173478 | 54.2% | 77.5% |
| Gemini 3 Pro | 46.7 | 2.04% | $0.59 | 1221222 | 58.3% | 84.6% |
| GPT-5.2 Codex | 45.0 | 1.69% | $0.46 | 579616 | 54.2% | 66.1% |
| GPT-5 Codex | 44.0 | 2.46% | $0.29 | 580361 | 55.3% | 86.9% |
| Kimi K2 Thinking | 43.8 | 1.47% | $0.42 | 2242684 | 58.3% | 95.1% |
| Claude Opus 4.5 | 43.8 | 0.93% | $1.19 | 1426974 | 58.3% | 95.3% |
| GPT-5.1 Codex | 42.9 | 1.25% | $0.64 | 1790759 | 50% | 84.2% |
| GLM 5 | 42.1 | 1.21% | $0.45 | 1426726 | 50% | 84.1% |
| GLM-4.7 | 41.3 | 2.12% | $0.27 | 1866019 | 56.3% | 94.1% |
| Qwen3 Coder Next | 40.0 | 1.21% | $0.49 | 2341400 | 64.6% | 97.6% |
| Minimax M 2.5 | 39.6 | 0.66% | $0.09 | 1391598 | 56.3% | 89.5% |
| Kimi K2.5 | 37.9 | 1.21% | $0.18 | 1156152 | 50% | 90.2% |
| DeepSeek-V3.2 | 37.5 | 1.14% | $0.15 | 2120848 | 45.8% | 85.1% |
| Devstral-2-123B | 37.5 | 2.19% | $0.09 | 1743224 | 52.1% | 96.6% |