Natural2Code

text

About

Natural2Code is a benchmark for evaluating natural language to code generation capabilities in interactive data science notebooks. It tests models' ability to generate executable code from natural language descriptions, focusing on data science workflows, programming tasks, and the integration of code generation within computational notebook environments.

Evaluation Stats

Total Models8

Organizations1

Verified Results0

Self-Reported8

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

8 models

Top Score

92.9%

Average Score

78.1%

High Performers (80%+)

Top Organizations

#1Google

8 models

78.1%

Leaderboard

8 models ranked by performance on Natural2Code

			License
#01Gemini 2.0 Flash	Google	Dec 1, 2024	Proprietary	92.9%
#02Gemini 1.5 Pro	Google	May 1, 2024	Proprietary	85.4%
#03Gemma 3 27B	Google	Mar 12, 2025	Gemma	84.5%
#04Gemma 3 12B	Google	Mar 12, 2025	Gemma	80.7%
#05Gemini 1.5 Flash	Google	May 1, 2024	Proprietary	79.8%
#06Gemini 1.5 Flash 8B	Google	Mar 15, 2024	Proprietary	75.5%
#07Gemma 3 4B	Google	Mar 12, 2025	Gemma	70.3%
#08Gemma 3 1B	Google	Mar 12, 2025	Gemma	56.0%

Resources

Research Paper