Phi 4
Zero-eval
#3PhiBench
by Microsoft
+
+
+
+
About
Phi-4 was introduced as the fourth generation of Microsoft's small language model series, designed to push the boundaries of what compact models can achieve. Built with advanced training techniques and architectural improvements, it demonstrates continued progress in efficient, high-quality language models.
+
+
+
+
Pricing Range
Input (per 1M)$0.07 -$0.07
Output (per 1M)$0.14 -$0.14
Providers1
+
+
+
+
Timeline
AnnouncedDec 12, 2024
ReleasedDec 12, 2024
Knowledge CutoffJun 1, 2024
+
+
+
+
Specifications
Training Tokens9.8T
+
+
+
+
License & Family
License
MIT
Performance Overview
Performance metrics and category breakdown
Overall Performance
13 benchmarks
Average Score
66.0%
Best Score
84.8%
High Performers (80%+)
5Performance Metrics
Max Context Window
32.0KAvg Throughput
33.0 tok/sAvg Latency
0ms+
+
+
+
All Benchmark Results for Phi 4
Complete list of benchmark scores with detailed information
| MMLU | text | 0.85 | 84.8% | Self-reported | |
| HumanEval+ | text | 0.83 | 82.8% | Self-reported | |
| HumanEval | text | 0.83 | 82.6% | Self-reported | |
| MGSM | text | 0.81 | 80.6% | Self-reported | |
| MATH | text | 0.80 | 80.4% | Self-reported | |
| DROP | text | 0.76 | 75.5% | Self-reported | |
| Arena Hard | text | 0.75 | 75.4% | Self-reported | |
| MMLU-Pro | text | 0.70 | 70.4% | Self-reported | |
| IFEval | text | 0.63 | 63.0% | Self-reported | |
| PhiBench | text | 0.56 | 56.2% | Self-reported |
Showing 1 to 10 of 13 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+