OSWorld

Agents
+
+
+
+
About

OSWorld evaluates multimodal AI agents on real computer tasks across web browsers, office suites, and OS interfaces using GUI interaction, with success rate as the primary metric.

+
+
+
+
Evaluation Stats
Total Models13
Organizations5
Verified Results0
Self-Reported4
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

13 models
Top Score
72.7%
Average Score
49.4%
High Performers (80%+)
0

Top Organizations

#1Anthropic
5 models
63.4%
#2Moonshot AI
1 model
63.3%
#3ByteDance
3 models
51.7%
#4OpenAI
2 models
30.6%
#5Alibaba / Qwen
2 models
22.7%
+
+
+
+
Leaderboard
13 models ranked by performance on OSWorld
LicenseLinks
Feb 1, 2026
Proprietary
72.7%
Feb 17, 2026
Proprietary
72.5%
Nov 1, 2025
Proprietary
66.3%
Jan 1, 2026
MIT
63.3%
Dec 18, 2025
Proprietary
61.9%
Sep 29, 2025
Proprietary
61.4%
Sep 4, 2025
Apache 2.0
53.1%
May 14, 2025
Proprietary
43.9%
Jan 22, 2026
Apache 2.0
41.6%
Jan 22, 2025
Proprietary
40.0%
Showing 1 to 10 of 13 models
+
+
+
+
Additional Metrics
Extended metrics for top models on OSWorld
ModelScoreMax StepsModel TypeOrganization
Kimi K2.563.3100General modelMoonshot AI
Seed-1.861.9100General modelByteDance Seed
Claude Sonnet 4.561.4100General modelAnthropic
UI-TARS-253.1100General modelByteDance Seed
Claude Sonnet 443.950General modelAnthropic
Qwen3-VL Flash41.6100General modelQwen Team, Alibaba Group
Doubao 1.5 Vision Pro40.0100General modelByteDance Seed
o323.0100General modelOpenAI
Qwen2.5-VL 32B Instruct3.915General modelAlibaba Cloud, Qwen Team
+
+
+
+
Resources