CRUXEval-Input-CoT

text

About

CRUXEval-Input-CoT is a code reasoning benchmark that evaluates Large Language Models' ability to predict function inputs using chain-of-thought reasoning. This variant of CRUXEval focuses on input prediction tasks where models must determine what inputs would produce given outputs through systematic reasoning. The benchmark tests reverse engineering capabilities and logical deduction skills in code understanding scenarios.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

56.5%

Average Score

56.5%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

56.5%

Leaderboard

1 models ranked by performance on CRUXEval-Input-CoT

			License		Links
#01Qwen2.5-Coder 7B Instruct	Alibaba Cloud / Qwen Team	Sep 19, 2024	Apache 2.0	56.5%

Resources

Research Paper