Tau2 Airline
text
+
+
+
+
About
TAU2-airline is part of the τ²-Bench evaluation framework, testing conversational agents in airline customer service scenarios within a dual-control environment. This advanced benchmark assesses AI agents' ability to handle complex aviation industry interactions, manage flight-related tasks, and provide customer support while maintaining consistency and accuracy across multiple conversation turns in realistic airline service contexts.
+
+
+
+
Evaluation Stats
Total Models10
Organizations4
Verified Results0
Self-Reported10
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
10 models
Top Score
64.8%
Average Score
55.8%
High Performers (80%+)
0Top Organizations
#1Anthropic
1 model
63.6%
#2OpenAI
3 models
57.6%
#3Moonshot AI
2 models
56.5%
#4Alibaba Cloud / Qwen Team
4 models
52.0%
+
+
+
+
Leaderboard
10 models ranked by performance on Tau2 Airline
| License | Links | ||||
|---|---|---|---|---|---|
| Apr 16, 2025 | Proprietary | 64.8% | |||
| Oct 15, 2025 | Proprietary | 63.6% | |||
| Aug 7, 2025 | Proprietary | 62.6% | |||
| Sep 10, 2025 | Apache 2.0 | 60.5% | |||
| Jul 25, 2025 | Apache 2.0 | 58.0% | |||
| Jul 11, 2025 | MIT | 56.5% | |||
| Sep 5, 2025 | MIT | 56.5% | |||
| Sep 10, 2025 | Apache 2.0 | 45.5% | |||
| Aug 6, 2024 | Proprietary | 45.5% | |||
| Jul 22, 2025 | Apache 2.0 | 44.0% |