CRUX-O

text
+
+
+
+
About

CRUX-O is a code understanding benchmark that focuses on output prediction tasks for evaluating AI models' program comprehension abilities. This benchmark tests models' capacity to analyze code logic, trace execution paths, and accurately predict program outputs. CRUX-O serves as a fundamental evaluation tool for measuring code reasoning and execution simulation capabilities in language models.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score100
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
79.0%
Average Score
79.0%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
1 model
79.0%
+
+
+
+
Leaderboard
1 models ranked by performance on CRUX-O
LicenseLinks
Apr 29, 2025
Apache 2.0
79.0%
+
+
+
+
Resources