
Phi 4
Zero-eval
#3PhiBench
by Microsoft
+
+
+
+
About
Phi 4 is a language model developed by Microsoft. It achieves strong performance with an average score of 66.0% across 13 benchmarks. It excels particularly in MMLU (84.8%), HumanEval+ (82.8%), HumanEval (82.6%). The model is available through 1 API provider. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2024, it represents Microsoft's latest advancement in AI technology.
+
+
+
+
Pricing Range
Input (per 1M)$0.07 -$0.07
Output (per 1M)$0.14 -$0.14
Providers1
+
+
+
+
Timeline
AnnouncedDec 12, 2024
ReleasedDec 12, 2024
Knowledge CutoffJun 1, 2024
+
+
+
+
Specifications
Training Tokens9.8T
+
+
+
+
License & Family
License
MIT
Performance Overview
Performance metrics and category breakdown
Overall Performance
13 benchmarks
Average Score
66.0%
Best Score
84.8%
High Performers (80%+)
5Performance Metrics
Max Context Window
32.0KAvg Throughput
33.0 tok/sAvg Latency
0ms+
+
+
+
All Benchmark Results for Phi 4
Complete list of benchmark scores with detailed information
MMLU | text | 0.85 | 84.8% | Self-reported | |
HumanEval+ | text | 0.83 | 82.8% | Self-reported | |
HumanEval | text | 0.83 | 82.6% | Self-reported | |
MGSM | text | 0.81 | 80.6% | Self-reported | |
MATH | text | 0.80 | 80.4% | Self-reported | |
DROP | text | 0.76 | 75.5% | Self-reported | |
Arena Hard | text | 0.75 | 75.4% | Self-reported | |
MMLU-Pro | text | 0.70 | 70.4% | Self-reported | |
IFEval | text | 0.63 | 63.0% | Self-reported | |
PhiBench | text | 0.56 | 56.2% | Self-reported |
Showing 1 to 10 of 13 benchmarks