ARC-AGI-2

Reasoning
+
+
+
+
About

ARC-AGI-2 tests AI systems on novel abstract visual pattern-matching tasks measuring fluid intelligence. Humans score ~100% while frontier models score well below, making it a key AGI milestone benchmark.

+
+
+
+
Evaluation Stats
Total Models8
Organizations3
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
77.1%
Average Score
46.8%
High Performers (80%+)
0

Top Organizations

#1OpenAI
1 model
54.2%
#2Google DeepMind
3 models
47.3%
#3Anthropic
4 models
44.6%
+
+
+
+
Leaderboard
8 models ranked by performance on ARC-AGI-2
LicenseLinks
Feb 19, 2026
Proprietary
77.1%
Feb 1, 2026
Proprietary
68.8%
Feb 17, 2026
Proprietary
58.3%
Dec 11, 2025
Proprietary
54.2%
Nov 1, 2025
Proprietary
37.6%
Dec 17, 2025
Proprietary
33.6%
Nov 18, 2025
Proprietary
31.1%
Sep 29, 2025
Proprietary
13.6%
+
+
+
+
Additional Metrics
Extended metrics for top models on ARC-AGI-2
ModelScoreCost/TaskAuthorARC-AGI-1System Type
Gemini 3.1 Pro77.1$0.962Google98%CoT
Claude Opus 4.668.8$2.25Anthropic86%CoT
Claude Sonnet 4.658.3$2.72Anthropic86%CoT
GPT-5.254.2$8.99OpenAI81.2%CoT
Claude Opus 4.537.6$2.4Anthropic80%CoT
Gemini 3 Flash33.6$0.231Google84.7%CoT
Gemini 3 Pro31.1$77.16Google87.5%CoT
+
+
+
+
Resources