Aider-Polyglot
text
+
+
+
+
About
Aider-Polyglot is a comprehensive AI coding benchmark that evaluates large language models across 225 challenging Exercism programming exercises in C++, Go, Java, JavaScript, Python, and Rust. This multi-language benchmark tests models' ability to solve complex coding problems, edit existing code, and correct mistakes through a two-attempt methodology. It measures code generation accuracy, edit format compliance, and debugging capabilities.
+
+
+
+
Evaluation Stats
Total Models21
Organizations6
Verified Results0
Self-Reported21
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
21 models
Top Score
88.0%
Average Score
58.0%
High Performers (80%+)
3Top Organizations
#1DeepSeek
4 models
66.0%
#2Google
4 models
61.8%
#3Moonshot AI
2 models
60.0%
#4OpenAI
8 models
54.0%
#5Alibaba Cloud / Qwen Team
2 models
53.5%
+
+
+
+
Leaderboard
21 models ranked by performance on Aider-Polyglot
License | Links | ||||
---|---|---|---|---|---|
Aug 7, 2025 | Proprietary | 88.0% | |||
Jun 5, 2025 | Proprietary | 82.2% | |||
Apr 16, 2025 | Proprietary | 81.3% | |||
May 20, 2025 | Proprietary | 76.5% | |||
Sep 29, 2025 | MIT | 74.5% | |||
May 28, 2025 | MIT | 71.6% | |||
Apr 16, 2025 | Proprietary | 68.9% | |||
Jan 10, 2025 | MIT | 68.4% | |||
Jan 30, 2025 | Proprietary | 66.7% | |||
May 20, 2025 | Proprietary | 61.9% |
Showing 1 to 10 of 21 models