Microsoft

Phi 4 Reasoning

Zero-eval
#1HumanEval+
#2FlenQA
#2OmniMath
+1 more

by Microsoft

+
+
+
+
About

Phi-4 Reasoning was developed to incorporate extended analytical thinking into the Phi-4 architecture, designed to spend more time on complex problem-solving. Built to combine compact model efficiency with reasoning depth, it represents Microsoft's exploration of thoughtful small models.

+
+
+
+
Timeline
AnnouncedApr 30, 2025
ReleasedApr 30, 2025
Knowledge CutoffMar 1, 2025
+
+
+
+
Specifications
Training Tokens16.0B
+
+
+
+
License & Family
License
MIT
Base ModelPhi 4
Performance Overview
Performance metrics and category breakdown

Overall Performance

11 benchmarks
Average Score
75.1%
Best Score
97.7%
High Performers (80%+)
3
+
+
+
+
All Benchmark Results for Phi 4 Reasoning
Complete list of benchmark scores with detailed information
FlenQA
text
0.98
97.7%
Self-reported
HumanEval+
text
0.93
92.9%
Self-reported
IFEval
text
0.83
83.4%
Self-reported
OmniMath
text
0.77
76.6%
Self-reported
AIME 2024
text
0.75
75.3%
Self-reported
MMLU-Pro
text
0.74
74.3%
Self-reported
Arena Hard
text
0.73
73.3%
Self-reported
PhiBench
text
0.71
70.6%
Self-reported
GPQA
text
0.66
65.8%
Self-reported
AIME 2025
text
0.63
62.9%
Self-reported
Showing 1 to 10 of 11 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+