AndroidWorld_SR
multimodal
+
+
+
+
About
AndroidWorld SR (Success Rate) is a dynamic benchmarking environment for autonomous Android agents featuring 116 programmatic tasks across 20 real-world Android apps. It evaluates AI agents' mobile device control capabilities through diverse tasks like recording audio, adding expenses, creating map markers, and managing calendars. The benchmark provides durable reward signals and measures success rates for complex multi-step interactions, with baseline agents achieving around 30.6% task completion.
+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
35.0%
Average Score
27.5%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
3 models
27.5%
+
+
+
+
Leaderboard
3 models ranked by performance on AndroidWorld_SR
License | Links | ||||
---|---|---|---|---|---|
Jan 26, 2025 | tongyi-qianwen | 35.0% | |||
Jan 26, 2025 | Apache 2.0 | 25.5% | |||
Feb 28, 2025 | Apache 2.0 | 22.0% |