UI-TARS-2
Multimodal
by ByteDance
+
+
+
+
About
UI-TARS-2, released by ByteDance in September 2025, is a major upgrade to the UI-TARS model family — built specifically for GUI agent tasks that interact with software interfaces by perceiving screenshots and performing human-like actions without requiring access to structured accessibility APIs. UI-TARS established a distinct niche by taking raw screenshots as its only input, making it applicable to GUI automation across any application regardless of whether the underlying software exposes programmatic access.
+
+
+
+
Timeline
ReleasedSep 4, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown
Overall Performance
1 benchmarks
Average Score
53.1%
Best Score
53.1%
High Performers (80%+)
0Top Categories
Agents
53.1%
+
+
+
+
All Benchmark Results for UI-TARS-2
Complete list of benchmark scores with detailed information
| OSWorld | Agents | 53.10 | 53.1% | Unverified |