AITZ_EM

multimodal
+
+
+
+
About

AITZ EM appears to be an AI evaluation benchmark where the 'EM' component likely refers to Exact Match scoring, a standard evaluation metric for question-answering tasks that measures the proportion of predictions exactly matching ground truth answers. Further verification is needed to confirm the specific methodology, scope, and evaluation criteria of this benchmark as it may be proprietary or emerging.

+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
83.2%
Average Score
82.7%
High Performers (80%+)
3

Top Organizations

#1Alibaba Cloud / Qwen Team
3 models
82.7%
+
+
+
+
Leaderboard
3 models ranked by performance on AITZ_EM
LicenseLinks
Jan 26, 2025
tongyi-qianwen
83.2%
Feb 28, 2025
Apache 2.0
83.1%
Jan 26, 2025
Apache 2.0
81.9%
+
+
+
+
Resources