TAU2-Bench Telecom

Agents

About

TAU2-Bench Telecom evaluates conversational AI agents in telecommunications support using a dual-control Dec-POMDP framework where both agent and user hold distinct tools, requiring agents to diagnose connectivity issues and coordinate with users performing device-side actions while taking backend steps.

Evaluation Stats

Total Models6

Organizations3

Verified Results0

Self-Reported6

Benchmark Details

Max Score100

Performance Overview

Score distribution and top performers

Score Distribution

6 models

Top Score

99.3%

Average Score

98.3%

High Performers (80%+)

Top Organizations

#1OpenAI

1 model

98.7%

#2Anthropic

4 models

98.4%

#3Google DeepMind

1 model

98.0%

Leaderboard

6 models ranked by performance on TAU2-Bench Telecom

			License
#01Claude Opus 4.6	Anthropic	Feb 1, 2026	Proprietary	99.3%
#02GPT-5.2	OpenAI	Dec 11, 2025	Proprietary	98.7%
#03Claude Opus 4.5	Anthropic	Nov 1, 2025	Proprietary	98.2%
#04Claude Sonnet 4.5	Anthropic	Sep 29, 2025	Proprietary	98.0%
#05Gemini 3 Pro	Google DeepMind	Nov 18, 2025	Proprietary	98.0%
#06Claude Sonnet 4.6	Anthropic	Feb 17, 2026	Proprietary	97.9%

Resources

Source Leaderboard Research Paper