AIME 2024

text

About

The AIME 2024 benchmark evaluates AI models' mathematical reasoning using 15 problems from the 2024 American Invitational Mathematics Examination. This challenging test requires step-by-step problem-solving across algebra, geometry, and number theory, with integer answers from 000-999. Models must demonstrate olympiad-level mathematical capabilities that qualify top high school students for USAMO. The benchmark uses exact match scoring and multiple runs to assess advanced logical reasoning.

Evaluation Stats

Total Models45

Organizations11

Verified Results0

Self-Reported45

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

45 models

Top Score

95.8%

Average Score

73.2%

High Performers (80%+)

Top Organizations

#1xAI

2 models

94.5%

#2Zhipu AI

2 models

90.2%

#3Google

3 models

84.4%

#4IBM

2 models

81.2%

#5Anthropic

1 model

80.0%

Leaderboard

45 models ranked by performance on AIME 2024

			License
#01Grok-3 Mini	xAI	Feb 17, 2025	Proprietary	95.8%
#02o4-mini	OpenAI	Apr 16, 2025	Proprietary	93.4%
#03Grok-3	xAI	Feb 17, 2025	Proprietary	93.3%
#04Gemini 2.5 Pro	Google	May 20, 2025	Proprietary	92.0%
#05o3	OpenAI	Apr 16, 2025	Proprietary	91.6%
#06DeepSeek-R1-0528	DeepSeek	May 28, 2025	MIT	91.4%
#07GLM-4.5	Zhipu AI	Jul 28, 2025	MIT	91.0%
#08GLM-4.5-Air	Zhipu AI	Jul 28, 2025	MIT	89.4%
#09Gemini 2.5 Flash	Google	May 20, 2025	Proprietary	88.0%
#10o3-mini	OpenAI	Jan 30, 2025	Proprietary	87.3%

Showing 1 to 10 of 45 models

...

Resources

Research Paper