OSWorld Extended
multimodal
+
+
+
+
About
OSWorld Extended is an expanded variant of the OSWorld benchmark featuring additional computer tasks and enhanced evaluation scenarios for multimodal agents. This extended version provides more comprehensive coverage of real-world operating system interactions, including advanced workflows and complex multi-application tasks that challenge agents' capabilities in authentic computing environments.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
22.0%
Average Score
22.0%
High Performers (80%+)
0Top Organizations
#1Anthropic
1 model
22.0%
+
+
+
+
Leaderboard
1 models ranked by performance on OSWorld Extended
License | Links | ||||
---|---|---|---|---|---|
Oct 22, 2024 | Proprietary | 22.0% |