CodeForces
text
+
+
+
+
About
CodeForces is a competition-level programming benchmark designed to evaluate Large Language Models' reasoning capabilities through challenging algorithmic problems from the CodeForces platform. This benchmark tests advanced problem-solving skills, algorithmic thinking, and code generation abilities using real competitive programming contests. CodeForces provides human-comparable evaluation of AI coding capabilities in complex, contest-quality scenarios requiring sophisticated reasoning and optimization.
+
+
+
+
Evaluation Stats
Total Models6
Organizations3
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score3000
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
87.4%
Average Score
73.6%
High Performers (80%+)
2Top Organizations
#1OpenAI
2 models
85.6%
#2DeepSeek
3 models
68.2%
#3Alibaba Cloud / Qwen Team
1 model
65.9%
+
+
+
+
Leaderboard
6 models ranked by performance on CodeForces
License | Links | ||||
---|---|---|---|---|---|
Aug 5, 2025 | Apache 2.0 | 87.4% | |||
Aug 5, 2025 | Apache 2.0 | 83.9% | |||
Sep 29, 2025 | MIT | 70.7% | |||
Jan 10, 2025 | MIT | 69.7% | |||
Apr 29, 2025 | Apache 2.0 | 65.9% | |||
May 28, 2025 | MIT | 64.3% |