MuirBench

multimodal
+
+
+
+
About

MuirBench is a comprehensive benchmark focusing on robust multi-image understanding capabilities of multimodal Large Language Models. It consists of 12 distinct tasks designed to evaluate models' ability to process, compare, and reason across multiple images simultaneously, testing advanced multimodal reasoning and visual comprehension skills.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
59.2%
Average Score
59.2%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
1 model
59.2%
+
+
+
+
Leaderboard
1 models ranked by performance on MuirBench
LicenseLinks
Mar 27, 2025
Apache 2.0
59.2%
+
+
+
+
Resources