
Qwen2.5-Omni-7B
Multimodal
Zero-eval
#1VocalSound
#1GiantSteps Tempo
#1MMBench-V1.1
+24 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2.5-Omni-7B is a multimodal language model developed by Alibaba Cloud / Qwen Team. The model shows competitive results across 45 benchmarks. It excels particularly in DocVQA (95.2%), VocalSound (93.9%), GSM8k (88.7%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Alibaba Cloud / Qwen Team's latest advancement in AI technology.
+
+
+
+
Timeline
AnnouncedMar 27, 2025
ReleasedMar 27, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown
Overall Performance
45 benchmarks
Average Score
59.2%
Best Score
95.2%
High Performers (80%+)
8+
+
+
+
All Benchmark Results for Qwen2.5-Omni-7B
Complete list of benchmark scores with detailed information
DocVQA | multimodal | 0.95 | 95.2% | Self-reported | |
VocalSound | audio | 0.94 | 93.9% | Self-reported | |
GSM8k | text | 0.89 | 88.7% | Self-reported | |
GiantSteps Tempo | audio | 0.88 | 88.0% | Self-reported | |
ChartQA | multimodal | 0.85 | 85.3% | Self-reported | |
TextVQA | multimodal | 0.84 | 84.4% | Self-reported | |
AI2D | multimodal | 0.83 | 83.2% | Self-reported | |
MMBench-V1.1 | multimodal | 0.82 | 81.8% | Self-reported | |
HumanEval | text | 0.79 | 78.7% | Self-reported | |
CRPErelation | text | 0.77 | 76.5% | Self-reported |
Showing 1 to 10 of 45 benchmarks
...