Natural2Code

text
+
+
+
+
About

Natural2Code is a benchmark for evaluating natural language to code generation capabilities in interactive data science notebooks. It tests models' ability to generate executable code from natural language descriptions, focusing on data science workflows, programming tasks, and the integration of code generation within computational notebook environments.

+
+
+
+
Evaluation Stats
Total Models8
Organizations1
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
92.9%
Average Score
78.1%
High Performers (80%+)
4

Top Organizations

#1Google
8 models
78.1%
+
+
+
+
Leaderboard
8 models ranked by performance on Natural2Code
LicenseLinks
Dec 1, 2024
Proprietary
92.9%
May 1, 2024
Proprietary
85.4%
Mar 12, 2025
Gemma
84.5%
Mar 12, 2025
Gemma
80.7%
May 1, 2024
Proprietary
79.8%
Mar 15, 2024
Proprietary
75.5%
Mar 12, 2025
Gemma
70.3%
Mar 12, 2025
Gemma
56.0%
+
+
+
+
Resources