MMBench

Multilingual
multimodal
+
+
+
+
About

MMBench is a systematically designed bilingual benchmark for robust evaluation of large vision-language models. It features meticulously curated multiple-choice questions in both English and Chinese, employing a rigorous CircularEval strategy to ensure accurate assessment. The benchmark evaluates multimodal perception and reasoning abilities across diverse visual understanding tasks, providing comprehensive evaluation of VLMs' capabilities with well-designed quality control schemes.

+
+
+
+
Evaluation Stats
Total Models7
Organizations3
Verified Results0
Self-Reported7
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

7 models
Top Score
88.0%
Average Score
81.4%
High Performers (80%+)
5

Top Organizations

#1Alibaba Cloud / Qwen Team
2 models
86.1%
#2Microsoft
2 models
84.3%
#3DeepSeek
3 models
76.4%
+
+
+
+
Leaderboard
7 models ranked by performance on MMBench
LicenseLinks
Jan 26, 2025
tongyi-qianwen
88.0%
Feb 1, 2025
MIT
86.7%
Jan 26, 2025
Apache 2.0
84.3%
Aug 23, 2024
MIT
81.9%
Dec 13, 2024
deepseek
80.3%
Dec 13, 2024
deepseek
79.6%
Dec 13, 2024
deepseek
69.2%
+
+
+
+
Resources