Alibaba Cloud / Qwen Team

Qwen2.5-Omni-7B

Multimodal
Zero-eval
#1VocalSound
#1GiantSteps Tempo
#1MMBench-V1.1
+24 more

by Alibaba Cloud / Qwen Team

+
+
+
+
About

Qwen2.5-Omni 7B was created as a multimodal model supporting text, audio, and other modalities, designed to provide integrated understanding across diverse input types. Built with 7 billion parameters for efficient omni-modal processing, it extends AI capabilities beyond traditional text-only or vision-language boundaries.

+
+
+
+
Timeline
AnnouncedMar 27, 2025
ReleasedMar 27, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown

Overall Performance

45 benchmarks
Average Score
59.2%
Best Score
95.2%
High Performers (80%+)
8
+
+
+
+
All Benchmark Results for Qwen2.5-Omni-7B
Complete list of benchmark scores with detailed information
DocVQA
multimodal
0.95
95.2%
Self-reported
VocalSound
audio
0.94
93.9%
Self-reported
GSM8k
text
0.89
88.7%
Self-reported
GiantSteps Tempo
audio
0.88
88.0%
Self-reported
ChartQA
multimodal
0.85
85.3%
Self-reported
TextVQA
multimodal
0.84
84.4%
Self-reported
AI2D
multimodal
0.83
83.2%
Self-reported
MMBench-V1.1
multimodal
0.82
81.8%
Self-reported
HumanEval
text
0.79
78.7%
Self-reported
CRPErelation
text
0.77
76.5%
Self-reported
Showing 1 to 10 of 45 benchmarks
...
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+