CharXiv-D

multimodal
+
+
+
+
About

CharXiv-D is a natural and challenging benchmark featuring charts collected from arXiv papers paired with human-curated questions for chart understanding evaluation. This dataset variant focuses on descriptive questions about chart content, testing AI models' ability to accurately interpret and describe visual data representations found in academic publications. CharXiv-D measures fundamental chart comprehension skills required for scientific document analysis.

+
+
+
+
Evaluation Stats
Total Models5
Organizations1
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

5 models
Top Score
90.0%
Average Score
85.1%
High Performers (80%+)
4

Top Organizations

#1OpenAI
5 models
85.1%
+
+
+
+
Leaderboard
5 models ranked by performance on CharXiv-D
LicenseLinks
Feb 27, 2025
Proprietary
90.0%
Apr 14, 2025
Proprietary
88.4%
Apr 14, 2025
Proprietary
87.9%
Aug 6, 2024
Proprietary
85.3%
Apr 14, 2025
Proprietary
73.9%
+
+
+
+
Resources