Terminal Bench 2.0

Coding
+
+
+
+
About

Terminal Bench 2.0 evaluates AI agents on terminal-based tasks measuring real-world command-line proficiency.

+
+
+
+
Evaluation Stats
Total Models15
Organizations5
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

15 models
Top Score
74.8%
Average Score
54.9%
High Performers (80%+)
0

Top Organizations

#1OpenAI
5 models
59.9%
#2Anthropic
4 models
58.8%
#3Google DeepMind
4 models
52.9%
#4Moonshot AI
1 model
43.2%
#5Zhipu AI
1 model
33.4%
+
+
+
+
Leaderboard
15 models ranked by performance on Terminal Bench 2.0
LicenseLinks
Feb 19, 2026
Proprietary
74.8%
Jan 14, 2026
Proprietary
66.5%
Feb 1, 2026
Proprietary
65.4%
Feb 1, 2026
Proprietary
64.7%
Dec 11, 2025
Proprietary
64.7%
Dec 17, 2025
Proprietary
64.3%
Nov 1, 2025
Proprietary
60.4%
Nov 1, 2025
Proprietary
59.8%
Feb 17, 2026
Proprietary
59.1%
Nov 18, 2025
Proprietary
56.2%
Showing 1 to 10 of 15 models
+
+
+
+
Additional Metrics
Extended metrics for top models on Terminal Bench 2.0
ModelScoreDateAgentAgent OrgModel Org
Gemini 3.1 Pro74.82026-02-23Terminus-KIRAKRAFTON AIGoogle
GPT-5.2 Codex66.52026-02-12Deep AgentsLangChainOpenAI
Claude Opus 4.665.42026-02-06Terminus 2Terminal BenchAnthropic
GPT-5.3 Codex64.72026-02-10Terminus 2Terminal BenchOpenAI
GPT-5.264.72025-12-24DroidFactoryOpenAI
Gemini 3 Flash64.32025-12-23Junie CLIJetBrainsGoogle
GPT-5.1 Codex Max60.42025-11-24Codex CLIOpenAIOpenAI
Claude Opus 4.559.82025-12-17Letta CodeLettaAnthropic
Gemini 3 Pro56.22026-02-23SageAgentOpenSageGoogle
Claude Sonnet 4.551.02025-12-24OpenHandsOpenHandsAnthropic
GPT-5 Codex43.42025-10-31Terminus 2Terminal BenchOpenAI
Kimi K2.543.22026-02-04Terminus 2Terminal BenchKimi
GLM-4.733.42026-01-28Terminus 2Terminal BenchZ-AI
Gemini 2.5 Flash16.42025-10-31OpenHandsOpenHandsGoogle
+
+
+
+
Resources