CharXiv-R

multimodal
+
+
+
+
About

CharXiv-R is a reasoning-focused variant of the CharXiv benchmark that tests AI models' ability to perform complex reasoning over charts from academic papers. Unlike descriptive tasks, this benchmark requires models to analyze, compare, and draw conclusions from chart data through multi-step logical reasoning. CharXiv-R evaluates advanced analytical capabilities essential for understanding scientific visualizations and data-driven research.

+
+
+
+
Evaluation Stats
Total Models8
Organizations1
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
81.1%
Average Score
62.5%
High Performers (80%+)
1

Top Organizations

#1OpenAI
8 models
62.5%
+
+
+
+
Leaderboard
8 models ranked by performance on CharXiv-R
LicenseLinks
Aug 7, 2025
Proprietary
81.1%
Apr 16, 2025
Proprietary
78.6%
Apr 16, 2025
Proprietary
72.0%
Aug 6, 2024
Proprietary
58.8%
Apr 14, 2025
Proprietary
56.8%
Apr 14, 2025
Proprietary
56.7%
Feb 27, 2025
Proprietary
55.4%
Apr 14, 2025
Proprietary
40.5%
+
+
+
+
Resources