MMVet

multimodal

About

MM-Vet is an evaluation benchmark that examines large multimodal models on complicated multimodal tasks by testing integration of six core vision-language capabilities: recognition, knowledge, spatial awareness, language generation, OCR, and math. It uses an LLM-based evaluator for open-ended outputs and evaluates 16 capability integrations, providing insights into models' ability to solve complex tasks requiring multiple integrated skills.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

76.2%

Average Score

71.6%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

2 models

71.6%

Leaderboard

2 models ranked by performance on MMVet

			License		Links
#01Qwen2.5 VL 72B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	tongyi-qianwen	76.2%
#02Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	67.1%

Resources

Research Paper