TAU-bench Airline
text
+
+
+
+
About
TAU-bench Airline is the aviation industry subset of the TAU-bench benchmark, specifically testing AI agents' capabilities in airline customer service scenarios with domain-specific APIs and policies. This specialized evaluation challenges models to handle flight bookings, cancellations, customer inquiries, and airline-specific procedures while maintaining accurate tool usage and following industry regulations and guidelines.
+
+
+
+
Evaluation Stats
Total Models20
Organizations4
Verified Results0
Self-Reported20
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
20 models
Top Score
70.0%
Average Score
47.8%
High Performers (80%+)
0Top Organizations
#1Zhipu AI
2 models
60.6%
#2Anthropic
7 models
53.3%
#3Alibaba Cloud / Qwen Team
3 models
46.3%
#4OpenAI
8 models
40.5%
+
+
+
+
Leaderboard
20 models ranked by performance on TAU-bench Airline
License | Links | ||||
---|---|---|---|---|---|
Sep 29, 2025 | Proprietary | 70.0% | |||
Jul 28, 2025 | MIT | 60.8% | |||
Jul 28, 2025 | MIT | 60.4% | |||
May 22, 2025 | Proprietary | 60.0% | |||
May 22, 2025 | Proprietary | 59.6% | |||
Feb 24, 2025 | Proprietary | 58.4% | |||
Aug 5, 2025 | Proprietary | 56.0% | |||
Feb 27, 2025 | Proprietary | 50.0% | |||
Dec 17, 2024 | Proprietary | 50.0% | |||
Apr 14, 2025 | Proprietary | 49.4% |
Showing 1 to 10 of 20 models