- Home
- /
- Benchmarks
- /
- TAU2-Bench Retail
TAU2-Bench Retail
Agents
+
+
+
+
About
TAU2-Bench Retail evaluates conversational AI agents on customer service tasks in a retail environment using a dual-control framework where both the agent and user hold tools, testing policy adherence, tool use, and task success across returns, exchanges, and order management scenarios.
+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
91.9%
Average Score
87.7%
High Performers (80%+)
6Top Organizations
#1Anthropic
4 models
89.7%
#2Google DeepMind
1 model
85.3%
#3OpenAI
1 model
82.0%
+
+
+
+
Leaderboard
6 models ranked by performance on TAU2-Bench Retail
| License | Links | ||||
|---|---|---|---|---|---|
| Feb 1, 2026 | Proprietary | 91.9% | |||
| Feb 17, 2026 | Proprietary | 91.7% | |||
| Nov 1, 2025 | Proprietary | 88.9% | |||
| Sep 29, 2025 | Proprietary | 86.2% | |||
| Nov 18, 2025 | Proprietary | 85.3% | |||
| Dec 11, 2025 | Proprietary | 82.0% |