Microsoft

Phi 4

Zero-eval
#3PhiBench

by Microsoft

+
+
+
+
About

Phi-4 was introduced as the fourth generation of Microsoft's small language model series, designed to push the boundaries of what compact models can achieve. Built with advanced training techniques and architectural improvements, it demonstrates continued progress in efficient, high-quality language models.

+
+
+
+
Pricing Range
Input (per 1M)$0.07 -$0.07
Output (per 1M)$0.14 -$0.14
Providers1
+
+
+
+
Timeline
AnnouncedDec 12, 2024
ReleasedDec 12, 2024
Knowledge CutoffJun 1, 2024
+
+
+
+
Specifications
Training Tokens9.8T
+
+
+
+
License & Family
License
MIT
Performance Overview
Performance metrics and category breakdown

Overall Performance

13 benchmarks
Average Score
66.0%
Best Score
84.8%
High Performers (80%+)
5

Performance Metrics

Max Context Window
32.0K
Avg Throughput
33.0 tok/s
Avg Latency
0ms
+
+
+
+
All Benchmark Results for Phi 4
Complete list of benchmark scores with detailed information
MMLU
text
0.85
84.8%
Self-reported
HumanEval+
text
0.83
82.8%
Self-reported
HumanEval
text
0.83
82.6%
Self-reported
MGSM
text
0.81
80.6%
Self-reported
MATH
text
0.80
80.4%
Self-reported
DROP
text
0.76
75.5%
Self-reported
Arena Hard
text
0.75
75.4%
Self-reported
MMLU-Pro
text
0.70
70.4%
Self-reported
IFEval
text
0.63
63.0%
Self-reported
PhiBench
text
0.56
56.2%
Self-reported
Showing 1 to 10 of 13 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+