Comprehensive side-by-side LLM comparison
Qwen2-VL-72B-Instruct leads with 9.6% higher average benchmark score. Overall, Qwen2-VL-72B-Instruct is the stronger choice for coding tasks.
xAI
Grok 1.5V was introduced as a vision-enabled variant of Grok 1.5, designed to understand and reason about both images and text. Built to extend Grok's capabilities into multimodal applications, it enables visual question answering and image analysis alongside textual understanding.
Alibaba Cloud / Qwen Team
Qwen2-VL 72B was developed as a large vision-language model, designed to handle multimodal tasks combining visual and textual understanding. Built with 72 billion parameters for integrated vision and language processing, it enables applications requiring sophisticated analysis of images alongside text.
4 months newer

Grok-1.5V
xAI
2024-04-12

Qwen2-VL-72B-Instruct
Alibaba Cloud / Qwen Team
2024-08-29
Average performance across 3 common benchmarks

Grok-1.5V

Qwen2-VL-72B-Instruct
Qwen2-VL-72B-Instruct
2023-06-30
Available providers and their performance metrics

Grok-1.5V

Qwen2-VL-72B-Instruct

Grok-1.5V

Qwen2-VL-72B-Instruct