Phi-4-multimodal-instruct

Name: Phi-4-multimodal-instruct
Price: 0.05 USD
Rating: 72.0 (15 reviews)
Author: Microsoft

Multimodal

Zero-eval

#1ScienceQA Visual

#1BLINK

#1InterGPS

+2 more

by Microsoft

About

Phi-4 Multimodal was created to handle multiple input modalities including text, images, and potentially other formats. Built to extend Phi-4's efficiency into multimodal applications, it demonstrates that compact models can successfully integrate diverse information types.

Pricing Range

Input (per 1M)$0.05 -$0.05

Output (per 1M)$0.10 -$0.10

Providers1

Timeline

AnnouncedFeb 1, 2025

ReleasedFeb 1, 2025

Knowledge CutoffJun 1, 2024

Specifications

Training Tokens5.0T

Capabilities

Multimodal

License & Family

License

MIT

Performance Overview

Performance metrics and category breakdown

15 benchmarks

Average Score

72.0%

Best Score

97.5%

High Performers (80%+)

Max Context Window

256.0K

Avg Throughput

25.0 tok/s

Avg Latency

1ms

All Benchmark Results for Phi-4-multimodal-instruct

Complete list of benchmark scores with detailed information

Showing 1 to 10 of 15 benchmarks

Resources