Android Control High_EM
multimodal
+
+
+
+
About
Android Control High-EM is a rigorous evaluation setting of the AndroidControl benchmark that measures AI agents' ability to perform precise mobile device control tasks with high exact-match scoring criteria. This benchmark tests multimodal AI models on complex Android app interactions, requiring agents to interpret natural language instructions and execute multi-step tasks with pixel-perfect accuracy across 833 diverse Android applications.
+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
69.6%
Average Score
65.7%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
3 models
65.7%
+
+
+
+
Leaderboard
3 models ranked by performance on Android Control High_EM
License | Links | ||||
---|---|---|---|---|---|
Feb 28, 2025 | Apache 2.0 | 69.6% | |||
Jan 26, 2025 | tongyi-qianwen | 67.4% | |||
Jan 26, 2025 | Apache 2.0 | 60.1% |