Comprehensive side-by-side LLM comparison
UI-TARS-2 leads with 14.9% higher average benchmark score. Overall, UI-TARS-2 is the stronger choice for coding tasks.
OpenAI
GPT-5.2, released by OpenAI on December 11, 2025, is a large language model from the GPT-5 family that improves on GPT-5 in general intelligence, long-context understanding, agentic tool-calling, and vision. It features a 400K token context window, 128K maximum output tokens, and a knowledge cutoff of August 2025. GPT-5.2 targets long-context coding tasks, extended document analysis, and complex agentic workflows requiring reliable instruction following.
ByteDance
UI-TARS-2, released by ByteDance in September 2025, is a major generational upgrade of the UI-TARS family of GUI interaction models, with enhanced capabilities across computer control, game environments, code generation, and tool use. It targets agentic workflows requiring robust multimodal understanding of graphical interfaces across diverse application domains.
3 months newer
UI-TARS-2
ByteDance
2025-09-04

GPT-5.2
OpenAI
2025-12-11
Average performance across 1 common benchmarks
GPT-5.2
UI-TARS-2
Performance comparison across key benchmark categories
GPT-5.2
UI-TARS-2
GPT-5.2
2025-08
Available providers and their performance metrics
GPT-5.2
UI-TARS-2
GPT-5.2
UI-TARS-2