MATH-500
text
+
+
+
+
About
MATH-500 is a curated subset of 500 diverse problems from the MATH benchmark, spanning probability, algebra, trigonometry, and geometry. This streamlined evaluation set provides efficient assessment of AI models' mathematical reasoning capabilities across multiple domains, offering representative coverage of mathematical problem-solving skills while maintaining the challenging nature of competition-level mathematics.
+
+
+
+
Evaluation Stats
Total Models24
Organizations9
Verified Results0
Self-Reported24
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
24 models
Top Score
98.2%
Average Score
91.9%
High Performers (80%+)
22Top Organizations
#1Zhipu AI
2 models
98.2%
#2Moonshot AI
3 models
97.0%
#3NVIDIA
3 models
96.3%
#4Anthropic
1 model
96.2%
#5Microsoft
1 model
94.6%
+
+
+
+
Leaderboard
24 models ranked by performance on MATH-500
License | Links | ||||
---|---|---|---|---|---|
Jul 28, 2025 | MIT | 98.2% | |||
Jul 28, 2025 | MIT | 98.1% | |||
Jul 11, 2025 | MIT | 97.4% | |||
Sep 5, 2025 | MIT | 97.4% | |||
Apr 7, 2025 | Llama 3.1 Community License | 97.0% | |||
Mar 18, 2025 | Llama 3.1 Community License | 96.6% | |||
Feb 24, 2025 | Proprietary | 96.2% | |||
Jan 20, 2025 | Proprietary | 96.2% | |||
Jan 20, 2025 | MIT | 95.9% | |||
Mar 18, 2025 | Llama 3.1 Community License | 95.4% |
Showing 1 to 10 of 24 models