Phi-3.5-vision-instruct

Name: Phi-3.5-vision-instruct
Rating: 68.3 (9 reviews)
Author: Microsoft

Multimodal

Zero-eval

#1ScienceQA

#1POPE

#2InterGPS

by Microsoft

About

Phi-3.5 Vision was developed as a multimodal variant of Phi-3.5, designed to understand and reason about both images and text. Built to extend the Phi family's efficiency into vision-language tasks, it enables compact multimodal AI for practical applications.

Timeline

AnnouncedAug 23, 2024

ReleasedAug 23, 2024

Specifications

Training Tokens500.0B

Capabilities

Multimodal

License & Family

License

MIT

Performance Overview

Performance metrics and category breakdown

9 benchmarks

Average Score

68.3%

Best Score

91.3%

High Performers (80%+)

All Benchmark Results for Phi-3.5-vision-instruct

Complete list of benchmark scores with detailed information

Resources