- Home
- /
- Benchmarks
- /
- Terminal Bench 2.0
Terminal Bench 2.0
Coding
+
+
+
+
About
Terminal Bench 2.0 evaluates AI agents on terminal-based tasks measuring real-world command-line proficiency.
+
+
+
+
Evaluation Stats
Total Models12
Organizations4
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
12 models
Top Score
66.5%
Average Score
57.0%
High Performers (80%+)
0Top Organizations
#1Google DeepMind
2 models
60.3%
#2OpenAI
5 models
59.5%
#3Anthropic
4 models
58.8%
#4MiniMax
1 model
30.0%
+
+
+
+
Leaderboard
12 models ranked by performance on Terminal Bench 2.0
| License | Links | ||||
|---|---|---|---|---|---|
| Jan 14, 2026 | Proprietary | 66.5% | |||
| Feb 5, 2026 | Proprietary | 65.4% | |||
| Feb 5, 2026 | Proprietary | 64.7% | |||
| Dec 11, 2025 | Proprietary | 64.7% | |||
| Dec 17, 2025 | Proprietary | 64.3% | |||
| Nov 19, 2025 | Proprietary | 60.4% | |||
| Nov 24, 2025 | Proprietary | 59.8% | |||
| Feb 17, 2026 | Proprietary | 59.1% | |||
| Nov 18, 2025 | Proprietary | 56.2% | |||
| Sep 29, 2025 | Proprietary | 51.0% |
Showing 1 to 10 of 12 models
+
+
+
+
Additional Metrics
Extended metrics for top models on Terminal Bench 2.0
| Model | Score | Date | Agent | Agent Org | Model Org |
|---|---|---|---|---|---|
| GPT-5.2 Codex | 66.5 | 2026-02-12 | CodeBrain-1 | LangChain | OpenAI |
| Claude Opus 4.6 | 65.4 | 2026-02-05 | Droid | Factory | Anthropic |
| GPT-5.3 Codex | 64.7 | 2026-02-05 | Terminus 2 | Terminal Bench | OpenAI |
| GPT-5.2 | 64.7 | 2025-12-12 | Terminus 2 | Terminal Bench | OpenAI |
| Gemini 3 Flash | 64.3 | 2025-12-23 | Junie CLI | JetBrains | |
| GPT-5.1 Codex Max | 60.4 | 2025-11-24 | Codex CLI | OpenAI | OpenAI |
| Claude Opus 4.5 | 59.8 | 2025-12-22 | Goose | Block | Anthropic |
| Gemini 3 Pro | 56.2 | 2026-01-06 | Ante | Antigma Labs | |
| Claude Sonnet 4.5 | 51.0 | 2025-12-24 | OpenHands | OpenHands | Anthropic |
| GPT-5 Codex | 41.3 | 2025-11-03 | Mini-SWE-Agent | Princeton | OpenAI |
| MiniMax M2.1 | 30.0 | 2025-11-01 | Terminus 2 | Terminal Bench | MiniMax |