MM-MT-Bench
multimodal
+
+
+
+
About
A multi-turn LLM-as-a-judge evaluation benchmark for testing multimodal instruction-tuned models' ability to follow user instructions in multi-turn dialogues and answer open-ended questions in a zero-shot manner.
+
+
+
+
Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score100
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
74.0%
Average Score
46.8%
High Performers (80%+)
0Top Organizations
#1Mistral AI
2 models
67.3%
#2Alibaba Cloud / Qwen Team
1 model
6.0%
+
+
+
+
Leaderboard
3 models ranked by performance on MM-MT-Bench
License | Links | ||||
---|---|---|---|---|---|
Nov 18, 2024 | Mistral Research License (MRL) for research; Mistral Commercial License for commercial use | 74.0% | |||
Sep 17, 2024 | Apache 2.0 | 60.5% | |||
Mar 27, 2025 | Apache 2.0 | 6.0% |