MATH-500

text
+
+
+
+
About

MATH-500 is a curated subset of 500 diverse problems from the MATH benchmark, spanning probability, algebra, trigonometry, and geometry. This streamlined evaluation set provides efficient assessment of AI models' mathematical reasoning capabilities across multiple domains, offering representative coverage of mathematical problem-solving skills while maintaining the challenging nature of competition-level mathematics.

+
+
+
+
Evaluation Stats
Total Models24
Organizations9
Verified Results0
Self-Reported24
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

24 models
Top Score
98.2%
Average Score
91.9%
High Performers (80%+)
22

Top Organizations

#1Zhipu AI
2 models
98.2%
#2Moonshot AI
3 models
97.0%
#3NVIDIA
3 models
96.3%
#4Anthropic
1 model
96.2%
#5Microsoft
1 model
94.6%
+
+
+
+
Leaderboard
24 models ranked by performance on MATH-500
LicenseLinks
Jul 28, 2025
MIT
98.2%
Jul 28, 2025
MIT
98.1%
Jul 11, 2025
MIT
97.4%
Sep 5, 2025
MIT
97.4%
Apr 7, 2025
Llama 3.1 Community License
97.0%
Mar 18, 2025
Llama 3.1 Community License
96.6%
Feb 24, 2025
Proprietary
96.2%
Jan 20, 2025
Proprietary
96.2%
Jan 20, 2025
MIT
95.9%
Mar 18, 2025
Llama 3.1 Community License
95.4%
Showing 1 to 10 of 24 models
+
+
+
+
Resources