Qwen2.5-Omni-7B
Multimodal
Zero-eval
#1VocalSound
#1GiantSteps Tempo
#1MMBench-V1.1
+24 more
by Alibaba Cloud / Qwen Team
+
+
+
+
About
Qwen2.5-Omni 7B was created as a multimodal model supporting text, audio, and other modalities, designed to provide integrated understanding across diverse input types. Built with 7 billion parameters for efficient omni-modal processing, it extends AI capabilities beyond traditional text-only or vision-language boundaries.
+
+
+
+
Timeline
AnnouncedMar 27, 2025
ReleasedMar 27, 2025
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown
Overall Performance
45 benchmarks
Average Score
59.2%
Best Score
95.2%
High Performers (80%+)
8+
+
+
+
All Benchmark Results for Qwen2.5-Omni-7B
Complete list of benchmark scores with detailed information
| DocVQA | multimodal | 0.95 | 95.2% | Self-reported | |
| VocalSound | audio | 0.94 | 93.9% | Self-reported | |
| GSM8k | text | 0.89 | 88.7% | Self-reported | |
| GiantSteps Tempo | audio | 0.88 | 88.0% | Self-reported | |
| ChartQA | multimodal | 0.85 | 85.3% | Self-reported | |
| TextVQA | multimodal | 0.84 | 84.4% | Self-reported | |
| AI2D | multimodal | 0.83 | 83.2% | Self-reported | |
| MMBench-V1.1 | multimodal | 0.82 | 81.8% | Self-reported | |
| HumanEval | text | 0.79 | 78.7% | Self-reported | |
| CRPErelation | text | 0.77 | 76.5% | Self-reported |
Showing 1 to 10 of 45 benchmarks
...
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+