MMAU

multimodal

About

MMAU (Massive Multi-Task Audio Understanding and Reasoning Benchmark) is a comprehensive benchmark featuring 10,000 carefully curated audio clips with human-annotated questions and answers spanning speech, environmental sounds, and music. It evaluates multimodal audio understanding models on tasks requiring expert-level knowledge and complex reasoning across 27 distinct skills, challenging models to demonstrate advanced audio perception and domain-specific understanding.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

65.6%

Average Score

65.6%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

65.6%

Leaderboard

1 models ranked by performance on MMAU

			License		Links
#01Qwen2.5-Omni-7B	Alibaba Cloud / Qwen Team	Mar 27, 2025	Apache 2.0	65.6%

Resources

Research Paper