Microsoft

Phi 4 Reasoning Plus

Zero-eval
#1FlenQA
#1OmniMath
#1PhiBench
+1 more

by Microsoft

+
+
+
+
About

Phi 4 Reasoning Plus is a language model developed by Microsoft. It achieves strong performance with an average score of 78.9% across 11 benchmarks. It excels particularly in FlenQA (97.9%), HumanEval+ (92.3%), IFEval (84.9%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Microsoft's latest advancement in AI technology.

+
+
+
+
Timeline
AnnouncedApr 30, 2025
ReleasedApr 30, 2025
Knowledge CutoffMar 1, 2025
+
+
+
+
Specifications
Training Tokens16.0B
+
+
+
+
License & Family
License
MIT
Performance Overview
Performance metrics and category breakdown

Overall Performance

11 benchmarks
Average Score
78.9%
Best Score
97.9%
High Performers (80%+)
5
+
+
+
+
All Benchmark Results for Phi 4 Reasoning Plus
Complete list of benchmark scores with detailed information
FlenQA
text
0.98
97.9%
Self-reported
HumanEval+
text
0.92
92.3%
Self-reported
IFEval
text
0.85
84.9%
Self-reported
OmniMath
text
0.82
81.9%
Self-reported
AIME 2024
text
0.81
81.3%
Self-reported
Arena Hard
text
0.79
79.0%
Self-reported
AIME 2025
text
0.78
78.0%
Self-reported
MMLU-Pro
text
0.76
76.0%
Self-reported
PhiBench
text
0.74
74.2%
Self-reported
GPQA
text
0.69
68.9%
Self-reported
Showing 1 to 10 of 11 benchmarks