Aider-Polyglot Edit

text

About

Aider-Polyglot Edit is an advanced AI code editing benchmark that specifically evaluates language models' ability to modify and integrate code within existing Python codebases. Using 133 Exercism coding exercises, it tests both code completion accuracy and adherence to specific edit formats. The benchmark measures practical coding skills including file editing, code integration, and format compliance for real-world AI programming assistance.

Evaluation Stats

Total Models10

Organizations3

Verified Results0

Self-Reported10

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

10 models

Top Score

79.7%

Average Score

48.2%

High Performers (80%+)

Top Organizations

#1DeepSeek

1 model

79.7%

#2Google

2 models

64.7%

#3OpenAI

7 models

38.9%

Leaderboard

10 models ranked by performance on Aider-Polyglot Edit

			License
#01DeepSeek-V3	DeepSeek	Dec 25, 2024	MIT + Model License (Commercial use allowed)	79.7%
#02Gemini 2.5 Pro	Google	May 20, 2025	Proprietary	72.7%
#03o3-mini	OpenAI	Jan 30, 2025	Proprietary	60.4%
#04o4-mini	OpenAI	Apr 16, 2025	Proprietary	58.2%
#05Gemini 2.5 Flash	Google	May 20, 2025	Proprietary	56.7%
#06GPT-4.1	OpenAI	Apr 14, 2025	Proprietary	52.9%
#07GPT-4.5	OpenAI	Feb 27, 2025	Proprietary	44.9%
#08GPT-4.1 mini	OpenAI	Apr 14, 2025	Proprietary	31.6%
#09GPT-4o	OpenAI	Aug 6, 2024	Proprietary	18.2%
#10GPT-4.1 nano	OpenAI	Apr 14, 2025	Proprietary	6.2%

Resources

Implementation