MM-MT-Bench

multimodal
+
+
+
+
About

A multi-turn LLM-as-a-judge evaluation benchmark for testing multimodal instruction-tuned models' ability to follow user instructions in multi-turn dialogues and answer open-ended questions in a zero-shot manner.

+
+
+
+
Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score100
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
74.0%
Average Score
46.8%
High Performers (80%+)
0

Top Organizations

#1Mistral AI
2 models
67.3%
#2Alibaba Cloud / Qwen Team
1 model
6.0%
+
+
+
+
Leaderboard
3 models ranked by performance on MM-MT-Bench
LicenseLinks
Nov 18, 2024
Mistral Research License (MRL) for research; Mistral Commercial License for commercial use
74.0%
Sep 17, 2024
Apache 2.0
60.5%
Mar 27, 2025
Apache 2.0
6.0%