Tau2 Retail
text
+
+
+
+
About
TAU2-retail is the retail component of the τ²-Bench framework, evaluating conversational agents in e-commerce and retail customer service environments. This comprehensive benchmark tests AI agents' capabilities in handling product queries, order processing, customer support interactions, and retail-specific policies within a structured dual-control evaluation environment that mirrors real-world retail customer service scenarios.
+
+
+
+
Evaluation Stats
Total Models10
Organizations4
Verified Results0
Self-Reported10
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
10 models
Top Score
83.2%
Average Score
71.7%
High Performers (80%+)
3Top Organizations
#1Anthropic
1 model
83.2%
#2OpenAI
3 models
74.9%
#3Moonshot AI
2 models
70.6%
#4Alibaba Cloud / Qwen Team
4 models
67.1%
+
+
+
+
Leaderboard
10 models ranked by performance on Tau2 Retail
| License | Links | ||||
|---|---|---|---|---|---|
| Oct 15, 2025 | Proprietary | 83.2% | |||
| Aug 7, 2025 | Proprietary | 81.1% | |||
| Apr 16, 2025 | Proprietary | 80.2% | |||
| Jul 25, 2025 | Apache 2.0 | 71.9% | |||
| Jul 22, 2025 | Apache 2.0 | 71.3% | |||
| Jul 11, 2025 | MIT | 70.6% | |||
| Sep 5, 2025 | MIT | 70.6% | |||
| Sep 10, 2025 | Apache 2.0 | 67.8% | |||
| Aug 6, 2024 | Proprietary | 63.4% | |||
| Sep 10, 2025 | Apache 2.0 | 57.3% |