MusicCaps

multimodal
+
+
+
+
About

MusicCaps is a music captioning dataset and benchmark for evaluating models' ability to generate descriptive text about musical audio. It features high-quality human-written captions describing musical characteristics, instruments, genres, and acoustic properties, enabling assessment of models' musical understanding and audio-to-text generation capabilities.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
32.8%
Average Score
32.8%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
1 model
32.8%
+
+
+
+
Leaderboard
1 models ranked by performance on MusicCaps
LicenseLinks
Mar 27, 2025
Apache 2.0
32.8%
+
+
+
+
Resources