CRUXEval-Input-CoT
text
+
+
+
+
About
CRUXEval-Input-CoT is a code reasoning benchmark that evaluates Large Language Models' ability to predict function inputs using chain-of-thought reasoning. This variant of CRUXEval focuses on input prediction tasks where models must determine what inputs would produce given outputs through systematic reasoning. The benchmark tests reverse engineering capabilities and logical deduction skills in code understanding scenarios.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
56.5%
Average Score
56.5%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
1 model
56.5%
+
+
+
+
Leaderboard
1 models ranked by performance on CRUXEval-Input-CoT
License | Links | ||||
---|---|---|---|---|---|
Sep 19, 2024 | Apache 2.0 | 56.5% |