Tau2 Airline

text
+
+
+
+
About

TAU2-airline is part of the τ²-Bench evaluation framework, testing conversational agents in airline customer service scenarios within a dual-control environment. This advanced benchmark assesses AI agents' ability to handle complex aviation industry interactions, manage flight-related tasks, and provide customer support while maintaining consistency and accuracy across multiple conversation turns in realistic airline service contexts.

+
+
+
+
Evaluation Stats
Total Models9
Organizations3
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

9 models
Top Score
64.8%
Average Score
54.9%
High Performers (80%+)
0

Top Organizations

#1OpenAI
3 models
57.6%
#2Moonshot AI
2 models
56.5%
#3Alibaba Cloud / Qwen Team
4 models
52.0%
+
+
+
+
Leaderboard
9 models ranked by performance on Tau2 Airline
LicenseLinks
Apr 16, 2025
Proprietary
64.8%
Aug 7, 2025
Proprietary
62.6%
Sep 10, 2025
Apache 2.0
60.5%
Jul 25, 2025
Apache 2.0
58.0%
Sep 5, 2025
MIT
56.5%
Jul 11, 2025
MIT
56.5%
Aug 6, 2024
Proprietary
45.5%
Sep 10, 2025
Apache 2.0
45.5%
Jul 22, 2025
Apache 2.0
44.0%
+
+
+
+
Resources