Tau2 Retail

text
+
+
+
+
About

TAU2-retail is the retail component of the τ²-Bench framework, evaluating conversational agents in e-commerce and retail customer service environments. This comprehensive benchmark tests AI agents' capabilities in handling product queries, order processing, customer support interactions, and retail-specific policies within a structured dual-control evaluation environment that mirrors real-world retail customer service scenarios.

+
+
+
+
Evaluation Stats
Total Models9
Organizations3
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

9 models
Top Score
81.1%
Average Score
70.5%
High Performers (80%+)
2

Top Organizations

#1OpenAI
3 models
74.9%
#2Moonshot AI
2 models
70.6%
#3Alibaba Cloud / Qwen Team
4 models
67.1%
+
+
+
+
Leaderboard
9 models ranked by performance on Tau2 Retail
LicenseLinks
Aug 7, 2025
Proprietary
81.1%
Apr 16, 2025
Proprietary
80.2%
Jul 25, 2025
Apache 2.0
71.9%
Jul 22, 2025
Apache 2.0
71.3%
Jul 11, 2025
MIT
70.6%
Sep 5, 2025
MIT
70.6%
Sep 10, 2025
Apache 2.0
67.8%
Aug 6, 2024
Proprietary
63.4%
Sep 10, 2025
Apache 2.0
57.3%
+
+
+
+
Resources