Android Control High_EM

multimodal
+
+
+
+
About

Android Control High-EM is a rigorous evaluation setting of the AndroidControl benchmark that measures AI agents' ability to perform precise mobile device control tasks with high exact-match scoring criteria. This benchmark tests multimodal AI models on complex Android app interactions, requiring agents to interpret natural language instructions and execute multi-step tasks with pixel-perfect accuracy across 833 diverse Android applications.

+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
69.6%
Average Score
65.7%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
3 models
65.7%
+
+
+
+
Leaderboard
3 models ranked by performance on Android Control High_EM
LicenseLinks
Feb 28, 2025
Apache 2.0
69.6%
Jan 26, 2025
tongyi-qianwen
67.4%
Jan 26, 2025
Apache 2.0
60.1%