HiddenMath

text

About

HiddenMath is a mathematical reasoning benchmark designed to evaluate AI models' ability to solve complex mathematical problems that require deep analytical thinking and problem-solving skills. This benchmark tests models' mathematical competency through challenging problems that go beyond simple arithmetic, measuring advanced mathematical reasoning and logical deduction capabilities in mathematical contexts.

Evaluation Stats

Total Models13

Organizations1

Verified Results0

Self-Reported13

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

13 models

Top Score

63.0%

Average Score

42.7%

High Performers (80%+)

Top Organizations

#1Google

13 models

42.7%

Leaderboard

13 models ranked by performance on HiddenMath

			License
#01Gemini 2.0 Flash	Google	Dec 1, 2024	Proprietary	63.0%
#02Gemma 3 27B	Google	Mar 12, 2025	Gemma	60.3%
#03Gemini 2.0 Flash-Lite	Google	Feb 5, 2025	Proprietary	55.3%
#04Gemma 3 12B	Google	Mar 12, 2025	Gemma	54.5%
#05Gemini 1.5 Pro	Google	May 1, 2024	Proprietary	52.0%
#06Gemini 1.5 Flash	Google	May 1, 2024	Proprietary	47.2%
#07Gemma 3 4B	Google	Mar 12, 2025	Gemma	43.0%
#08Gemma 3n E4B Instructed LiteRT Preview	Google	May 20, 2025	Gemma	37.7%
#09Gemma 3n E4B Instructed	Google	Jun 26, 2025	Proprietary	37.7%
#10Gemini 1.5 Flash 8B	Google	Mar 15, 2024	Proprietary	32.8%

Showing 1 to 10 of 13 models

Resources

Research Paper