TAU-bench Retail

text
+
+
+
+
About

TAU-bench Retail is the retail industry subset of the TAU-bench benchmark, testing AI agents' performance in e-commerce and retail customer service scenarios with specialized APIs and business policies. This domain-specific evaluation challenges models to handle product inquiries, order management, returns, customer support, and retail-specific workflows while maintaining accuracy in tool usage and adherence to retail industry standards.

+
+
+
+
Evaluation Stats
Total Models22
Organizations4
Verified Results0
Self-Reported22
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

22 models
Top Score
86.2%
Average Score
67.5%
High Performers (80%+)
5

Top Organizations

#1Zhipu AI
2 models
78.8%
#2Anthropic
7 models
76.0%
#3Alibaba Cloud / Qwen Team
3 models
66.1%
#4OpenAI
10 models
59.8%
+
+
+
+
Leaderboard
22 models ranked by performance on TAU-bench Retail
LicenseLinks
Sep 29, 2025
Proprietary
86.2%
Aug 5, 2025
Proprietary
82.4%
May 22, 2025
Proprietary
81.4%
Feb 24, 2025
Proprietary
81.2%
May 22, 2025
Proprietary
80.5%
Jul 28, 2025
MIT
79.7%
Jul 28, 2025
MIT
77.9%
Apr 16, 2025
Proprietary
71.8%
Dec 17, 2024
Proprietary
70.8%
Sep 10, 2025
Apache 2.0
69.6%
Showing 1 to 10 of 22 models
+
+
+
+
Resources