AndroidWorld_SR

multimodal
+
+
+
+
About

AndroidWorld SR (Success Rate) is a dynamic benchmarking environment for autonomous Android agents featuring 116 programmatic tasks across 20 real-world Android apps. It evaluates AI agents' mobile device control capabilities through diverse tasks like recording audio, adding expenses, creating map markers, and managing calendars. The benchmark provides durable reward signals and measures success rates for complex multi-step interactions, with baseline agents achieving around 30.6% task completion.

+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
35.0%
Average Score
27.5%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
3 models
27.5%
+
+
+
+
Leaderboard
3 models ranked by performance on AndroidWorld_SR
LicenseLinks
Jan 26, 2025
tongyi-qianwen
35.0%
Jan 26, 2025
Apache 2.0
25.5%
Feb 28, 2025
Apache 2.0
22.0%
+
+
+
+
Resources