- Home
- /
- Benchmarks
- /
- SWE Bench Verified
SWE Bench Verified
Coding
+
+
+
+
About
A verified subset of 500 software engineering problems from real GitHub issues, validated by human annotators.
+
+
+
+
Evaluation Stats
Total Models21
Organizations9
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
21 models
Top Score
80.9%
Average Score
76.5%
High Performers (80%+)
4Top Organizations
#1MiniMax
1 model
80.2%
#2Google DeepMind
2 models
78.0%
#3Anthropic
5 models
77.8%
#4Moonshot AI
1 model
76.8%
#5OpenAI
7 models
76.0%
+
+
+
+
Leaderboard
21 models ranked by performance on SWE Bench Verified
| License | Links | ||||
|---|---|---|---|---|---|
| Nov 24, 2025 | Proprietary | 80.9% | |||
| Feb 5, 2026 | Proprietary | 80.8% | |||
| Feb 12, 2026 | MIT | 80.2% | |||
| Dec 11, 2025 | Proprietary | 80.0% | |||
| Feb 17, 2026 | Proprietary | 79.6% | |||
| Dec 17, 2025 | Proprietary | 78.0% | |||
| Nov 18, 2025 | Proprietary | 78.0% | |||
#08GLM 5 New | Feb 11, 2026 | MIT | 77.8% | ||
| Jan 27, 2026 | MIT | 76.8% | |||
| Nov 12, 2025 | Proprietary | 76.3% |
Showing 1 to 10 of 21 models
+
+
+
+
Additional Metrics
Extended metrics for top models on SWE Bench Verified
| Model | Score | Cost | Size | Context | License |
|---|---|---|---|---|---|
| Claude Opus 4.5 | 80.9 | $5.00 $25.00 | — | 200K | |
| Claude Opus 4.6 | 80.8 | $5.00 $25.00 | — | 200K | |
| Minimax M 2.5 | 80.2 | $0.30 $1.20 | 230B | 1.0M | |
| GPT-5.2 | 80.0 | $1.75 $14.00 | — | 400K | |
| Gemini 3 Flash | 78.0 | $0.50 $3.00 | — | 1.0M | |
| Gemini 3 Pro | 78.0 | $2.00 $12.00 | — | 1.0M | |
| GLM 5 | 77.8 | $1.00 $3.20 | 744B | 200K | |
| Kimi K2.5 | 76.8 | $0.60 $2.50 | 1.0T | 262K | |
| GPT-5.1 Instant | 76.3 | $1.25 $10.00 | — | 400K | |
| GPT-5.1 | 76.3 | $1.25 $10.00 | — | 400K | |
| GPT-5.1 Thinking | 76.3 | $1.25 $10.00 | — | 400K | |
| GPT-5 | 74.9 | $1.25 $10.00 | — | 400K | |
| Claude Opus 4.1 | 74.5 | $15.00 $75.00 | — | 200K | |
| GPT-5 Codex | 74.5 | $— | — | — | |
| Step-3.5-Flash | 74.4 | $0.10 $0.40 | 196B | 66K | |
| GLM-4.7 | 73.8 | $0.60 $2.20 | 358B | 205K | |
| GPT-5.1 Codex | 73.7 | $1.25 $10.00 | — | 400K | |
| MiMo-V2-Flash | 73.4 | $0.10 $0.30 | 309B | 256K | |
| Claude Haiku 4.5 | 73.3 | $1.00 $5.00 | — | 200K | |
| DeepSeek-V3.2 Thinking | 73.1 | $0.28 $0.42 | 685B | 131K |