- Home
- /
- Benchmarks
- /
- Terminal Bench 2.0
Terminal Bench 2.0
Coding
+
+
+
+
About
Terminal Bench 2.0 evaluates AI agents on terminal-based tasks measuring real-world command-line proficiency.
+
+
+
+
Evaluation Stats
Total Models15
Organizations5
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
15 models
Top Score
74.8%
Average Score
54.9%
High Performers (80%+)
0Top Organizations
#1OpenAI
5 models
59.9%
#2Anthropic
4 models
58.8%
#3Google DeepMind
4 models
52.9%
#4Moonshot AI
1 model
43.2%
#5Zhipu AI
1 model
33.4%
+
+
+
+
Leaderboard
15 models ranked by performance on Terminal Bench 2.0
| License | Links | ||||
|---|---|---|---|---|---|
| Feb 19, 2026 | Proprietary | 74.8% | |||
| Jan 14, 2026 | Proprietary | 66.5% | |||
| Feb 1, 2026 | Proprietary | 65.4% | |||
| Feb 1, 2026 | Proprietary | 64.7% | |||
| Dec 11, 2025 | Proprietary | 64.7% | |||
| Dec 17, 2025 | Proprietary | 64.3% | |||
| Nov 1, 2025 | Proprietary | 60.4% | |||
| Nov 1, 2025 | Proprietary | 59.8% | |||
| Feb 17, 2026 | Proprietary | 59.1% | |||
| Nov 18, 2025 | Proprietary | 56.2% |
Showing 1 to 10 of 15 models
+
+
+
+
Additional Metrics
Extended metrics for top models on Terminal Bench 2.0
| Model | Score | Date | Agent | Agent Org | Model Org |
|---|---|---|---|---|---|
| Gemini 3.1 Pro | 74.8 | 2026-02-23 | Terminus-KIRA | KRAFTON AI | |
| GPT-5.2 Codex | 66.5 | 2026-02-12 | Deep Agents | LangChain | OpenAI |
| Claude Opus 4.6 | 65.4 | 2026-02-06 | Terminus 2 | Terminal Bench | Anthropic |
| GPT-5.3 Codex | 64.7 | 2026-02-10 | Terminus 2 | Terminal Bench | OpenAI |
| GPT-5.2 | 64.7 | 2025-12-24 | Droid | Factory | OpenAI |
| Gemini 3 Flash | 64.3 | 2025-12-23 | Junie CLI | JetBrains | |
| GPT-5.1 Codex Max | 60.4 | 2025-11-24 | Codex CLI | OpenAI | OpenAI |
| Claude Opus 4.5 | 59.8 | 2025-12-17 | Letta Code | Letta | Anthropic |
| Gemini 3 Pro | 56.2 | 2026-02-23 | SageAgent | OpenSage | |
| Claude Sonnet 4.5 | 51.0 | 2025-12-24 | OpenHands | OpenHands | Anthropic |
| GPT-5 Codex | 43.4 | 2025-10-31 | Terminus 2 | Terminal Bench | OpenAI |
| Kimi K2.5 | 43.2 | 2026-02-04 | Terminus 2 | Terminal Bench | Kimi |
| GLM-4.7 | 33.4 | 2026-01-28 | Terminus 2 | Terminal Bench | Z-AI |
| Gemini 2.5 Flash | 16.4 | 2025-10-31 | OpenHands | OpenHands |