Aider-Polyglot

text
+
+
+
+
About

Aider-Polyglot is a comprehensive AI coding benchmark that evaluates large language models across 225 challenging Exercism programming exercises in C++, Go, Java, JavaScript, Python, and Rust. This multi-language benchmark tests models' ability to solve complex coding problems, edit existing code, and correct mistakes through a two-attempt methodology. It measures code generation accuracy, edit format compliance, and debugging capabilities.

+
+
+
+
Evaluation Stats
Total Models21
Organizations6
Verified Results0
Self-Reported21
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

21 models
Top Score
88.0%
Average Score
58.0%
High Performers (80%+)
3

Top Organizations

#1DeepSeek
4 models
66.0%
#2Google
4 models
61.8%
#3Moonshot AI
2 models
60.0%
#4OpenAI
8 models
54.0%
#5Alibaba Cloud / Qwen Team
2 models
53.5%
+
+
+
+
Leaderboard
21 models ranked by performance on Aider-Polyglot
LicenseLinks
Aug 7, 2025
Proprietary
88.0%
Jun 5, 2025
Proprietary
82.2%
Apr 16, 2025
Proprietary
81.3%
May 20, 2025
Proprietary
76.5%
Sep 29, 2025
MIT
74.5%
May 28, 2025
MIT
71.6%
Apr 16, 2025
Proprietary
68.9%
Jan 10, 2025
MIT
68.4%
Jan 30, 2025
Proprietary
66.7%
May 20, 2025
Proprietary
61.9%
Showing 1 to 10 of 21 models
+
+
+
+
Resources