MuirBench

multimodal

About

MuirBench is a comprehensive benchmark focusing on robust multi-image understanding capabilities of multimodal Large Language Models. It consists of 12 distinct tasks designed to evaluate models' ability to process, compare, and reason across multiple images simultaneously, testing advanced multimodal reasoning and visual comprehension skills.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

59.2%

Average Score

59.2%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

59.2%

Leaderboard

1 models ranked by performance on MuirBench

			License		Links
#01Qwen2.5-Omni-7B	Alibaba Cloud / Qwen Team	Mar 27, 2025	Apache 2.0	59.2%

Resources

Research Paper