TAU-bench Airline

text
+
+
+
+
About

TAU-bench Airline is the aviation industry subset of the TAU-bench benchmark, specifically testing AI agents' capabilities in airline customer service scenarios with domain-specific APIs and policies. This specialized evaluation challenges models to handle flight bookings, cancellations, customer inquiries, and airline-specific procedures while maintaining accurate tool usage and following industry regulations and guidelines.

+
+
+
+
Evaluation Stats
Total Models20
Organizations4
Verified Results0
Self-Reported20
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

20 models
Top Score
70.0%
Average Score
47.8%
High Performers (80%+)
0

Top Organizations

#1Zhipu AI
2 models
60.6%
#2Anthropic
7 models
53.3%
#3Alibaba Cloud / Qwen Team
3 models
46.3%
#4OpenAI
8 models
40.5%
+
+
+
+
Leaderboard
20 models ranked by performance on TAU-bench Airline
LicenseLinks
Sep 29, 2025
Proprietary
70.0%
Jul 28, 2025
MIT
60.8%
Jul 28, 2025
MIT
60.4%
May 22, 2025
Proprietary
60.0%
May 22, 2025
Proprietary
59.6%
Feb 24, 2025
Proprietary
58.4%
Aug 5, 2025
Proprietary
56.0%
Feb 27, 2025
Proprietary
50.0%
Dec 17, 2024
Proprietary
50.0%
Apr 14, 2025
Proprietary
49.4%
Showing 1 to 10 of 20 models
+
+
+
+
Resources