CRUXEval-Input-CoT

text
+
+
+
+
About

CRUXEval-Input-CoT is a code reasoning benchmark that evaluates Large Language Models' ability to predict function inputs using chain-of-thought reasoning. This variant of CRUXEval focuses on input prediction tasks where models must determine what inputs would produce given outputs through systematic reasoning. The benchmark tests reverse engineering capabilities and logical deduction skills in code understanding scenarios.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
56.5%
Average Score
56.5%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
1 model
56.5%
+
+
+
+
Leaderboard
1 models ranked by performance on CRUXEval-Input-CoT
LicenseLinks
Sep 19, 2024
Apache 2.0
56.5%
+
+
+
+
Resources