UI-TARS-2

Multimodal

by ByteDance

+
+
+
+
About

UI-TARS-2, released by ByteDance in September 2025, is a major upgrade to the UI-TARS model family — built specifically for GUI agent tasks that interact with software interfaces by perceiving screenshots and performing human-like actions without requiring access to structured accessibility APIs. UI-TARS established a distinct niche by taking raw screenshots as its only input, making it applicable to GUI automation across any application regardless of whether the underlying software exposes programmatic access.

+
+
+
+
Timeline
ReleasedSep 4, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown

Overall Performance

1 benchmarks
Average Score
53.1%
Best Score
53.1%
High Performers (80%+)
0

Top Categories

Agents
53.1%
+
+
+
+
All Benchmark Results for UI-TARS-2
Complete list of benchmark scores with detailed information
OSWorld
Agents
53.10
53.1%
Unverified