Qwen2.5-VL 32B Instruct

Name: Qwen2.5-VL 32B Instruct
Rating: 44.0 (2 reviews)
Author: Alibaba / Qwen

Multimodal

by Alibaba / Qwen

About

Qwen2.5-VL-32B-Instruct is a 32-billion-parameter vision-language model from Alibaba, extending the Qwen2.5 architecture with multimodal capabilities for understanding images, documents, charts, and video frames alongside text. The model was designed for tasks requiring deep visual reasoning — such as document parsing, table extraction, and spatial understanding — with performance that made it a practical choice for document intelligence and visual data analysis workflows. As an open-weight model, it became a widely adopted foundation for fine-tuning domain-specific multimodal applications.

Timeline

ReleasedMar 1, 2025

Specifications

Capabilities

Multimodal

License & Family

License

Apache 2.0

Performance Overview

Performance metrics and category breakdown

2 benchmarks

Average Score

44.0%

Best Score

84.0%

High Performers (80%+)

Coding

84.0%

Agents

3.9%

All Benchmark Results for Qwen2.5-VL 32B Instruct

Complete list of benchmark scores with detailed information


MBPP	Coding		84.00	84.0%	Unverified
OSWorld	Agents		3.90	3.9%	Unverified

Resources