CodeForces

text

About

CodeForces is a competition-level programming benchmark designed to evaluate Large Language Models' reasoning capabilities through challenging algorithmic problems from the CodeForces platform. This benchmark tests advanced problem-solving skills, algorithmic thinking, and code generation abilities using real competitive programming contests. CodeForces provides human-comparable evaluation of AI coding capabilities in complex, contest-quality scenarios requiring sophisticated reasoning and optimization.

Evaluation Stats

Total Models6

Organizations3

Verified Results0

Self-Reported6

Benchmark Details

Max Score3000

Language

Performance Overview

Score distribution and top performers

Score Distribution

6 models

Top Score

87.4%

Average Score

73.6%

High Performers (80%+)

Top Organizations

#1OpenAI

2 models

85.6%

#2DeepSeek

3 models

68.2%

#3Alibaba Cloud / Qwen Team

1 model

65.9%

Leaderboard

6 models ranked by performance on CodeForces

			License
#01GPT OSS 120B	OpenAI	Aug 5, 2025	Apache 2.0	87.4%
#02GPT OSS 20B	OpenAI	Aug 5, 2025	Apache 2.0	83.9%
#03DeepSeek-V3.2-Exp	DeepSeek	Sep 29, 2025	MIT	70.7%
#04DeepSeek-V3.1	DeepSeek	Jan 10, 2025	MIT	69.7%
#05Qwen3 32B	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	65.9%
#06DeepSeek-R1-0528	DeepSeek	May 28, 2025	MIT	64.3%

Resources

Research Paper