MMBench
Multilingual
multimodal
+
+
+
+
About
MMBench is a systematically designed bilingual benchmark for robust evaluation of large vision-language models. It features meticulously curated multiple-choice questions in both English and Chinese, employing a rigorous CircularEval strategy to ensure accurate assessment. The benchmark evaluates multimodal perception and reasoning abilities across diverse visual understanding tasks, providing comprehensive evaluation of VLMs' capabilities with well-designed quality control schemes.
+
+
+
+
Evaluation Stats
Total Models7
Organizations3
Verified Results0
Self-Reported7
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
7 models
Top Score
88.0%
Average Score
81.4%
High Performers (80%+)
5Top Organizations
#1Alibaba Cloud / Qwen Team
2 models
86.1%
#2Microsoft
2 models
84.3%
#3DeepSeek
3 models
76.4%
+
+
+
+
Leaderboard
7 models ranked by performance on MMBench
License | Links | ||||
---|---|---|---|---|---|
Jan 26, 2025 | tongyi-qianwen | 88.0% | |||
Feb 1, 2025 | MIT | 86.7% | |||
Jan 26, 2025 | Apache 2.0 | 84.3% | |||
Aug 23, 2024 | MIT | 81.9% | |||
Dec 13, 2024 | deepseek | 80.3% | |||
Dec 13, 2024 | deepseek | 79.6% | |||
Dec 13, 2024 | deepseek | 69.2% |