AIME 2025

text

About

The AIME 2025 benchmark tests AI models using 15 complex mathematical problems from the 2025 American Invitational Mathematics Examination. This rigorous evaluation measures advanced mathematical reasoning, problem-solving, and logical inference across algebra, geometry, and number theory. Each answer is an integer from 000-999, requiring precise step-by-step solutions. The benchmark represents one of the most challenging tests for AI mathematical capabilities.

Evaluation Stats

Total Models48

Organizations11

Verified Results0

Self-Reported48

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

48 models

Top Score

100.0%

Average Score

68.8%

High Performers (80%+)

Top Organizations

#1Zhipu AI

1 model

93.9%

#2xAI

5 models

93.6%

#3Alibaba Cloud / Qwen Team

7 models

77.9%

#4Anthropic

6 models

77.0%

#5OpenAI

7 models

76.7%

Leaderboard

48 models ranked by performance on AIME 2025

			License
#01Grok-4 Heavy	xAI	Jul 9, 2025	Proprietary	100.0%
#02Claude Haiku 4.5	Anthropic	Oct 15, 2025	Proprietary	96.3%
#03GPT-5	OpenAI	Aug 7, 2025	Proprietary	94.6%
#04GLM-4.6	Zhipu AI	Sep 30, 2025	MIT	93.9%
#05Grok-3	xAI	Feb 17, 2025	Proprietary	93.3%
#06o4-mini	OpenAI	Apr 16, 2025	Proprietary	92.7%
#07Qwen3-235B-A22B-Thinking-2507	Alibaba Cloud / Qwen Team	Jul 25, 2025	Apache 2.0	92.3%
#08Grok 4 Fast	xAI	Aug 28, 2025	Proprietary	92.0%
#09Grok-4	xAI	Jul 9, 2025	Proprietary	91.7%
#10GPT-5 mini	OpenAI	Aug 7, 2025	Proprietary	91.1%

Showing 1 to 10 of 48 models

...

Resources

Research Paper