MuirBench
multimodal
+
+
+
+
About
MuirBench is a comprehensive benchmark focusing on robust multi-image understanding capabilities of multimodal Large Language Models. It consists of 12 distinct tasks designed to evaluate models' ability to process, compare, and reason across multiple images simultaneously, testing advanced multimodal reasoning and visual comprehension skills.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
59.2%
Average Score
59.2%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
1 model
59.2%
+
+
+
+
Leaderboard
1 models ranked by performance on MuirBench
License | Links | ||||
---|---|---|---|---|---|
Mar 27, 2025 | Apache 2.0 | 59.2% |