Phi 4 Reasoning
Zero-eval
#1HumanEval+
#2FlenQA
#2OmniMath
+1 more
by Microsoft
+
+
+
+
About
Phi-4 Reasoning was developed to incorporate extended analytical thinking into the Phi-4 architecture, designed to spend more time on complex problem-solving. Built to combine compact model efficiency with reasoning depth, it represents Microsoft's exploration of thoughtful small models.
+
+
+
+
Timeline
AnnouncedApr 30, 2025
ReleasedApr 30, 2025
Knowledge CutoffMar 1, 2025
+
+
+
+
Specifications
Training Tokens16.0B
+
+
+
+
License & Family
License
MIT
Base ModelPhi 4
Performance Overview
Performance metrics and category breakdown
Overall Performance
11 benchmarks
Average Score
75.1%
Best Score
97.7%
High Performers (80%+)
3+
+
+
+
All Benchmark Results for Phi 4 Reasoning
Complete list of benchmark scores with detailed information
| FlenQA | text | 0.98 | 97.7% | Self-reported | |
| HumanEval+ | text | 0.93 | 92.9% | Self-reported | |
| IFEval | text | 0.83 | 83.4% | Self-reported | |
| OmniMath | text | 0.77 | 76.6% | Self-reported | |
| AIME 2024 | text | 0.75 | 75.3% | Self-reported | |
| MMLU-Pro | text | 0.74 | 74.3% | Self-reported | |
| Arena Hard | text | 0.73 | 73.3% | Self-reported | |
| PhiBench | text | 0.71 | 70.6% | Self-reported | |
| GPQA | text | 0.66 | 65.8% | Self-reported | |
| AIME 2025 | text | 0.63 | 62.9% | Self-reported |
Showing 1 to 10 of 11 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+