MathVista

multimodal

About

MathVista is a comprehensive benchmark for evaluating mathematical reasoning in visual contexts, featuring 6,141 examples from 28 existing datasets. This benchmark combines challenges from diverse mathematical and visual tasks, testing AI models' ability to perform mathematical reasoning with charts, plots, geometric figures, and scientific diagrams across multiple mathematical domains and visual representations.

Evaluation Stats

Total Models35

Organizations10

Verified Results0

Self-Reported33

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

35 models

Top Score

86.8%

Average Score

62.6%

High Performers (80%+)

Top Organizations

#1Moonshot AI

1 model

74.9%

#2Alibaba Cloud / Qwen Team

2 models

69.7%

#3Anthropic

1 model

67.7%

#4Mistral AI

3 models

64.8%

#5OpenAI

11 models

63.5%

Leaderboard

35 models ranked by performance on MathVista

			License
#01o3	OpenAI	Apr 16, 2025	Proprietary	86.8%
#02o4-mini	OpenAI	Apr 16, 2025	Proprietary	84.3%
#03Kimi-k1.5	Moonshot AI	Jan 20, 2025	Proprietary	74.9%
#04Llama 4 Maverick	Meta	Apr 5, 2025	Llama 4 Community License Agreement	73.7%
#05GPT-4.1 mini	OpenAI	Apr 14, 2025	Proprietary	73.1%
#06GPT-4.5	OpenAI	Feb 27, 2025	Proprietary	72.3%
#07GPT-4.1	OpenAI	Apr 14, 2025	Proprietary	72.2%
#08o1	OpenAI	Dec 17, 2024	Proprietary	71.8%
#09QvQ-72B-Preview	Alibaba Cloud / Qwen Team	Dec 25, 2024	Qwen	71.4%
#10Llama 4 Scout	Meta	Apr 5, 2025	Llama 4 Community License Agreement	70.7%

Showing 1 to 10 of 35 models

Resources

Research Paper