OmniMath

text

About

Omni-MATH is a comprehensive and challenging benchmark specifically designed to assess large language models' mathematical reasoning capabilities at the Olympiad level. It features complex mathematical problems that require advanced reasoning, problem-solving skills, and deep mathematical understanding, providing rigorous evaluation of models' mathematical competency.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

81.9%

Average Score

79.3%

High Performers (80%+)

Top Organizations

#1Microsoft

2 models

79.3%

Leaderboard

2 models ranked by performance on OmniMath

			License		Links
#01Phi 4 Reasoning Plus	Microsoft	Apr 30, 2025	MIT	81.9%
#02Phi 4 Reasoning	Microsoft	Apr 30, 2025	MIT	76.6%

Resources

Research Paper