SWE Bench Verified

Coding
+
+
+
+
About

A verified subset of 500 software engineering problems from real GitHub issues, validated by human annotators.

+
+
+
+
Evaluation Stats
Total Models25
Organizations11
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

25 models
Top Score
80.9%
Average Score
76.7%
High Performers (80%+)
5

Top Organizations

#1MiniMax
1 model
80.2%
#2Google DeepMind
3 models
78.9%
#3Anthropic
6 models
77.7%
#4Moonshot AI
1 model
76.8%
#5ByteDance
1 model
76.5%
+
+
+
+
Leaderboard
25 models ranked by performance on SWE Bench Verified
LicenseLinks
Nov 1, 2025
Proprietary
80.9%
Feb 1, 2026
Proprietary
80.8%
Feb 19, 2026
Proprietary
80.6%
Feb 13, 2026
MIT
80.2%
Dec 11, 2025
Proprietary
80.0%
Feb 17, 2026
Proprietary
79.6%
Dec 17, 2025
Proprietary
78.0%
Nov 18, 2025
Proprietary
78.0%
Feb 11, 2026
MIT
77.8%
Sep 29, 2025
Proprietary
77.2%
Showing 1 to 10 of 25 models
+
+
+
+
Additional Metrics
Extended metrics for top models on SWE Bench Verified
ModelScoreCostSizeContextLicense
Claude Opus 4.580.9$5.00 $25.00—200K
Claude Opus 4.680.8$5.00 $25.00—200K
Gemini 3.1 Pro80.6$2.50 $15.00—1.0M
Minimax M 2.580.2$0.30 $1.20230B1.0M
GPT-5.280.0$1.75 $14.00—400K
Claude Sonnet 4.679.6$3.00 $15.00—200K
Gemini 3 Flash78.0$0.50 $3.00—1.0M
Gemini 3 Pro78.0$2.00 $12.00—1.0M
GLM 577.8$1.00 $3.20744B200K
Kimi K2.576.8$0.60 $2.501.0T262K
Seed 2.0 Pro76.5———
Qwen3.5-397B-A17B76.4$0.60 $3.60397B262K
GPT-5.1 Thinking76.3$1.25 $10.00—400K
GPT-5.1 Instant76.3$1.25 $10.00—400K
GPT-5.176.3$1.25 $10.00—400K
GPT-574.9$1.25 $10.00—400K
Claude Opus 4.174.5$15.00 $75.00—200K
GPT-5 Codex74.5———
Step-3.5-Flash74.4$0.10 $0.40196B66K
GLM-4.773.8$0.60 $2.20358B205K
GPT-5.1 Codex73.7$1.25 $10.00—400K
MiMo-V2-Flash73.4$0.10 $0.30309B256K
Claude Haiku 4.573.3$1.00 $5.00—200K
DeepSeek-V3.2 Thinking73.1$0.28 $0.42685B131K