Aider

text

About

Aider's AI coding benchmarks evaluate language models through 225 polyglot programming exercises across C++, Go, Java, JavaScript, Python, and Rust. The benchmark tests code editing, instruction following, and real-world programming capabilities with pass rates from 3.6% to 88%. It provides comprehensive evaluation of AI coding assistants' ability to translate natural language into executable code.

Evaluation Stats

Total Models4

Organizations2

Verified Results0

Self-Reported4

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

4 models

Top Score

72.2%

Average Score

59.9%

High Performers (80%+)

Top Organizations

#1DeepSeek

1 model

72.2%

#2Alibaba Cloud / Qwen Team

3 models

55.9%

Leaderboard

4 models ranked by performance on Aider

			License
#01DeepSeek-V2.5	DeepSeek	May 8, 2024	deepseek	72.2%
#02Qwen3 235B A22B	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	61.8%
#03Qwen2.5-Coder 7B Instruct	Alibaba Cloud / Qwen Team	Sep 19, 2024	Apache 2.0	55.6%
#04Qwen3 32B	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	50.2%

Resources

Implementation