OSWorld Extended

multimodal
+
+
+
+
About

OSWorld Extended is an expanded variant of the OSWorld benchmark featuring additional computer tasks and enhanced evaluation scenarios for multimodal agents. This extended version provides more comprehensive coverage of real-world operating system interactions, including advanced workflows and complex multi-application tasks that challenge agents' capabilities in authentic computing environments.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
22.0%
Average Score
22.0%
High Performers (80%+)
0

Top Organizations

#1Anthropic
1 model
22.0%
+
+
+
+
Leaderboard
1 models ranked by performance on OSWorld Extended
LicenseLinks
Oct 22, 2024
Proprietary
22.0%
+
+
+
+
Resources