MATH-500

text

About

MATH-500 is a curated subset of 500 diverse problems from the MATH benchmark, spanning probability, algebra, trigonometry, and geometry. This streamlined evaluation set provides efficient assessment of AI models' mathematical reasoning capabilities across multiple domains, offering representative coverage of mathematical problem-solving skills while maintaining the challenging nature of competition-level mathematics.

Evaluation Stats

Total Models25

Organizations9

Verified Results0

Self-Reported25

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

25 models

Top Score

98.2%

Average Score

92.1%

High Performers (80%+)

Top Organizations

#1Zhipu AI

2 models

98.2%

#2Moonshot AI

3 models

97.0%

#3NVIDIA

4 models

96.7%

#4Anthropic

1 model

96.2%

#5Microsoft

1 model

94.6%

Leaderboard

25 models ranked by performance on MATH-500

			License
#01GLM-4.5	Zhipu AI	Jul 28, 2025	MIT	98.2%
#02GLM-4.5-Air	Zhipu AI	Jul 28, 2025	MIT	98.1%
#03Nemotron Nano 9B v2	NVIDIA	Aug 18, 2025	NVIDIA Open Model License Agreement	97.8%
#04Kimi K2 Instruct	Moonshot AI	Jul 11, 2025	MIT	97.4%
#05Kimi K2-Instruct-0905	Moonshot AI	Sep 5, 2025	MIT	97.4%
#06Llama 3.1 Nemotron Ultra 253B v1	NVIDIA	Apr 7, 2025	Llama 3.1 Community License	97.0%
#07Llama-3.3 Nemotron Super 49B v1	NVIDIA	Mar 18, 2025	Llama 3.1 Community License	96.6%
#08Claude 3.7 Sonnet	Anthropic	Feb 24, 2025	Proprietary	96.2%
#09Kimi-k1.5	Moonshot AI	Jan 20, 2025	Proprietary	96.2%
#10DeepSeek R1 Zero	DeepSeek	Jan 20, 2025	MIT	95.9%

Showing 1 to 10 of 25 models

Resources

Research Paper