Comprehensive side-by-side LLM comparison
QvQ-72B-Preview leads with 27.4% higher average benchmark score. Overall, QvQ-72B-Preview is the stronger choice for coding tasks.
Microsoft
Phi-3.5 Vision was developed as a multimodal variant of Phi-3.5, designed to understand and reason about both images and text. Built to extend the Phi family's efficiency into vision-language tasks, it enables compact multimodal AI for practical applications.
Alibaba Cloud / Qwen Team
QVQ-72B Preview was introduced as an experimental visual question answering model, designed to combine vision and language understanding for complex reasoning tasks. Built to demonstrate advanced multimodal reasoning capabilities, it represents Qwen's exploration into models that can analyze and reason about visual information.
4 months newer

Phi-3.5-vision-instruct
Microsoft
2024-08-23

QvQ-72B-Preview
Alibaba Cloud / Qwen Team
2024-12-25
Average performance across 2 common benchmarks

Phi-3.5-vision-instruct

QvQ-72B-Preview
Available providers and their performance metrics

Phi-3.5-vision-instruct

QvQ-72B-Preview

Phi-3.5-vision-instruct

QvQ-72B-Preview