
Phi 4 Reasoning Plus
Zero-eval
#1FlenQA
#1OmniMath
#1PhiBench
+1 more
by Microsoft
+
+
+
+
About
Phi 4 Reasoning Plus is a language model developed by Microsoft. It achieves strong performance with an average score of 78.9% across 11 benchmarks. It excels particularly in FlenQA (97.9%), HumanEval+ (92.3%), IFEval (84.9%). It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Microsoft's latest advancement in AI technology.
+
+
+
+
Timeline
AnnouncedApr 30, 2025
ReleasedApr 30, 2025
Knowledge CutoffMar 1, 2025
+
+
+
+
Specifications
Training Tokens16.0B
+
+
+
+
License & Family
License
MIT
Performance Overview
Performance metrics and category breakdown
Overall Performance
11 benchmarks
Average Score
78.9%
Best Score
97.9%
High Performers (80%+)
5+
+
+
+
All Benchmark Results for Phi 4 Reasoning Plus
Complete list of benchmark scores with detailed information
FlenQA | text | 0.98 | 97.9% | Self-reported | |
HumanEval+ | text | 0.92 | 92.3% | Self-reported | |
IFEval | text | 0.85 | 84.9% | Self-reported | |
OmniMath | text | 0.82 | 81.9% | Self-reported | |
AIME 2024 | text | 0.81 | 81.3% | Self-reported | |
Arena Hard | text | 0.79 | 79.0% | Self-reported | |
AIME 2025 | text | 0.78 | 78.0% | Self-reported | |
MMLU-Pro | text | 0.76 | 76.0% | Self-reported | |
PhiBench | text | 0.74 | 74.2% | Self-reported | |
GPQA | text | 0.69 | 68.9% | Self-reported |
Showing 1 to 10 of 11 benchmarks