Codegolf v2.2

text

About

Codegolf v2.2 is a specialized programming benchmark that evaluates AI models' ability to write extremely concise code while maintaining correctness and functionality. This benchmark tests code optimization skills, creative problem-solving, and deep programming language understanding through challenges that require minimal character count solutions. Codegolf v2.2 measures AI systems' capacity for elegant, efficient code generation under strict brevity constraints.

Evaluation Stats

Total Models4

Organizations1

Verified Results0

Self-Reported4

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

4 models

Top Score

16.8%

Average Score

13.9%

High Performers (80%+)

Top Organizations

#1Google

4 models

13.9%

Leaderboard

4 models ranked by performance on Codegolf v2.2

			License
#01Gemma 3n E4B Instructed	Google	Jun 26, 2025	Proprietary	16.8%
#02Gemma 3n E4B Instructed LiteRT Preview	Google	May 20, 2025	Gemma	16.8%
#03Gemma 3n E2B Instructed	Google	Jun 26, 2025	Proprietary	11.0%
#04Gemma 3n E2B Instructed LiteRT (Preview)	Google	May 20, 2025	Gemma	11.0%