CharXiv-R

multimodal

About

CharXiv-R is a reasoning-focused variant of the CharXiv benchmark that tests AI models' ability to perform complex reasoning over charts from academic papers. Unlike descriptive tasks, this benchmark requires models to analyze, compare, and draw conclusions from chart data through multi-step logical reasoning. CharXiv-R evaluates advanced analytical capabilities essential for understanding scientific visualizations and data-driven research.

Evaluation Stats

Total Models8

Organizations1

Verified Results0

Self-Reported8

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

8 models

Top Score

81.1%

Average Score

62.5%

High Performers (80%+)

Top Organizations

#1OpenAI

8 models

62.5%

Leaderboard

8 models ranked by performance on CharXiv-R

			License
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	81.1%
#02o3	OpenAI	Apr 16, 2025	Proprietary	78.6%
#03o4-mini	OpenAI	Apr 16, 2025	Proprietary	72.0%
#04GPT-4o	OpenAI	Aug 6, 2024	Proprietary	58.8%
#05GPT-4.1 mini	OpenAI	Apr 14, 2025	Proprietary	56.8%
#06GPT-4.1	OpenAI	Apr 14, 2025	Proprietary	56.7%
#07GPT-4.5	OpenAI	Feb 27, 2025	Proprietary	55.4%
#08GPT-4.1 nano	OpenAI	Apr 14, 2025	Proprietary	40.5%

Resources

Research Paper