TAU2-Bench Telecom

Agents
+
+
+
+
About

TAU2-Bench Telecom evaluates conversational AI agents in telecommunications support using a dual-control Dec-POMDP framework where both agent and user hold distinct tools, requiring agents to diagnose connectivity issues and coordinate with users performing device-side actions while taking backend steps.

+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

6 models
Top Score
99.3%
Average Score
98.3%
High Performers (80%+)
6

Top Organizations

#1OpenAI
1 model
98.7%
#2Anthropic
4 models
98.4%
#3Google DeepMind
1 model
98.0%
+
+
+
+
Leaderboard
6 models ranked by performance on TAU2-Bench Telecom
LicenseLinks
Feb 1, 2026
Proprietary
99.3%
Dec 11, 2025
Proprietary
98.7%
Nov 1, 2025
Proprietary
98.2%
Sep 29, 2025
Proprietary
98.0%
Nov 18, 2025
Proprietary
98.0%
Feb 17, 2026
Proprietary
97.9%