VocalSound

audio

About

VocalSound is an audio classification benchmark featuring 21,024 crowdsourced recordings of human vocal sounds including laughter, sighs, coughs, throat clearing, sneezes, and sniffs from 3,365 unique subjects. This comprehensive evaluation tests AI models' ability to recognize and classify non-verbal human vocalizations, including detailed metadata on speaker demographics, health conditions, and acoustic characteristics.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

93.9%

Average Score

93.9%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

93.9%

Leaderboard

1 models ranked by performance on VocalSound

			License		Links
#01Qwen2.5-Omni-7B	Alibaba Cloud / Qwen Team	Mar 27, 2025	Apache 2.0	93.9%

Resources

Research Paper