- Home
- /
- Benchmarks
- /
- TAU2-Bench Telecom
TAU2-Bench Telecom
Agents
+
+
+
+
About
TAU2-Bench Telecom evaluates conversational AI agents in telecommunications support using a dual-control Dec-POMDP framework where both agent and user hold distinct tools, requiring agents to diagnose connectivity issues and coordinate with users performing device-side actions while taking backend steps.
+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
99.3%
Average Score
98.3%
High Performers (80%+)
6Top Organizations
#1OpenAI
1 model
98.7%
#2Anthropic
4 models
98.4%
#3Google DeepMind
1 model
98.0%
+
+
+
+
Leaderboard
6 models ranked by performance on TAU2-Bench Telecom
| License | Links | ||||
|---|---|---|---|---|---|
| Feb 1, 2026 | Proprietary | 99.3% | |||
| Dec 11, 2025 | Proprietary | 98.7% | |||
| Nov 1, 2025 | Proprietary | 98.2% | |||
| Sep 29, 2025 | Proprietary | 98.0% | |||
| Nov 18, 2025 | Proprietary | 98.0% | |||
| Feb 17, 2026 | Proprietary | 97.9% |