TAU-bench Retail
text
+
+
+
+
About
TAU-bench Retail is the retail industry subset of the TAU-bench benchmark, testing AI agents' performance in e-commerce and retail customer service scenarios with specialized APIs and business policies. This domain-specific evaluation challenges models to handle product inquiries, order management, returns, customer support, and retail-specific workflows while maintaining accuracy in tool usage and adherence to retail industry standards.
+
+
+
+
Evaluation Stats
Total Models22
Organizations4
Verified Results0
Self-Reported22
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
22 models
Top Score
86.2%
Average Score
67.5%
High Performers (80%+)
5Top Organizations
#1Zhipu AI
2 models
78.8%
#2Anthropic
7 models
76.0%
#3Alibaba Cloud / Qwen Team
3 models
66.1%
#4OpenAI
10 models
59.8%
+
+
+
+
Leaderboard
22 models ranked by performance on TAU-bench Retail
License | Links | ||||
---|---|---|---|---|---|
Sep 29, 2025 | Proprietary | 86.2% | |||
Aug 5, 2025 | Proprietary | 82.4% | |||
May 22, 2025 | Proprietary | 81.4% | |||
Feb 24, 2025 | Proprietary | 81.2% | |||
May 22, 2025 | Proprietary | 80.5% | |||
Jul 28, 2025 | MIT | 79.7% | |||
Jul 28, 2025 | MIT | 77.9% | |||
Apr 16, 2025 | Proprietary | 71.8% | |||
Dec 17, 2024 | Proprietary | 70.8% | |||
Sep 10, 2025 | Apache 2.0 | 69.6% |
Showing 1 to 10 of 22 models