Aider-Polyglot

text

About

Aider-Polyglot is a comprehensive AI coding benchmark that evaluates large language models across 225 challenging Exercism programming exercises in C++, Go, Java, JavaScript, Python, and Rust. This multi-language benchmark tests models' ability to solve complex coding problems, edit existing code, and correct mistakes through a two-attempt methodology. It measures code generation accuracy, edit format compliance, and debugging capabilities.

Evaluation Stats

Total Models21

Organizations6

Verified Results0

Self-Reported21

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

21 models

Top Score

88.0%

Average Score

58.0%

High Performers (80%+)

Top Organizations

#1DeepSeek

4 models

66.0%

#2Google

4 models

61.8%

#3Moonshot AI

2 models

60.0%

#4OpenAI

8 models

54.0%

#5Alibaba Cloud / Qwen Team

2 models

53.5%

Leaderboard

21 models ranked by performance on Aider-Polyglot

			License
#01GPT-5	OpenAI	Aug 7, 2025	Proprietary	88.0%
#02Gemini 2.5 Pro Preview 06-05	Google	Jun 5, 2025	Proprietary	82.2%
#03o3	OpenAI	Apr 16, 2025	Proprietary	81.3%
#04Gemini 2.5 Pro	Google	May 20, 2025	Proprietary	76.5%
#05DeepSeek-V3.2-Exp	DeepSeek	Sep 29, 2025	MIT	74.5%
#06DeepSeek-R1-0528	DeepSeek	May 28, 2025	MIT	71.6%
#07o4-mini	OpenAI	Apr 16, 2025	Proprietary	68.9%
#08DeepSeek-V3.1	DeepSeek	Jan 10, 2025	MIT	68.4%
#09o3-mini	OpenAI	Jan 30, 2025	Proprietary	66.7%
#10Gemini 2.5 Flash	Google	May 20, 2025	Proprietary	61.9%

Showing 1 to 10 of 21 models

Resources

Implementation