CRUX-O

text

About

CRUX-O is a code understanding benchmark that focuses on output prediction tasks for evaluating AI models' program comprehension abilities. This benchmark tests models' capacity to analyze code logic, trace execution paths, and accurately predict program outputs. CRUX-O serves as a fundamental evaluation tool for measuring code reasoning and execution simulation capabilities in language models.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score100

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

79.0%

Average Score

79.0%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

79.0%

Leaderboard

1 models ranked by performance on CRUX-O

			License		Links
#01Qwen3 235B A22B	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	79.0%

Resources

Research Paper