CharXiv-D

multimodal

About

CharXiv-D is a natural and challenging benchmark featuring charts collected from arXiv papers paired with human-curated questions for chart understanding evaluation. This dataset variant focuses on descriptive questions about chart content, testing AI models' ability to accurately interpret and describe visual data representations found in academic publications. CharXiv-D measures fundamental chart comprehension skills required for scientific document analysis.

Evaluation Stats

Total Models5

Organizations1

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

90.0%

Average Score

85.1%

High Performers (80%+)

Top Organizations

#1OpenAI

5 models

85.1%

Leaderboard

5 models ranked by performance on CharXiv-D

			License
#01GPT-4.5	OpenAI	Feb 27, 2025	Proprietary	90.0%
#02GPT-4.1 mini	OpenAI	Apr 14, 2025	Proprietary	88.4%
#03GPT-4.1	OpenAI	Apr 14, 2025	Proprietary	87.9%
#04GPT-4o	OpenAI	Aug 6, 2024	Proprietary	85.3%
#05GPT-4.1 nano	OpenAI	Apr 14, 2025	Proprietary	73.9%

Resources

Research Paper