SWE-Lancer

text

About

SWE-Lancer is a comprehensive software engineering benchmark featuring over 1,400 real freelance tasks from Upwork valued at $1 million total, testing both independent engineering and managerial capabilities. This unique evaluation maps model performance to monetary value, assessing AI systems' ability to solve complex, real-world software engineering challenges with end-to-end verification by experienced engineers.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

37.3%

Average Score

29.3%

High Performers (80%+)

Top Organizations

#1OpenAI

3 models

29.3%

Leaderboard

3 models ranked by performance on SWE-Lancer

			License
#01GPT-4.5	OpenAI	Feb 27, 2025	Proprietary	37.3%
#02GPT-4o	OpenAI	Aug 6, 2024	Proprietary	32.6%
#03o3-mini	OpenAI	Jan 30, 2025	Proprietary	18.0%

Resources

Research Paper