Comprehensive side-by-side LLM comparison
GPT-4.1 leads with 9.7% higher average benchmark score. GPT-4.1 offers 680.3K more tokens in context window than Claude 3 Opus. GPT-4.1 is $80.00 cheaper per million tokens. Claude 3 Opus is available on 3 providers. Overall, GPT-4.1 is the stronger choice for coding tasks.
Anthropic
Claude 3 Opus was developed as the most capable model in the Claude 3 family, designed to set new industry benchmarks across a wide range of cognitive tasks. Built to handle complex analysis and extended tasks requiring deep reasoning, it balanced frontier intelligence with careful safety considerations, representing the flagship tier of the Claude 3 generation.
OpenAI
GPT-4.1 represents an iterative improvement in the GPT-4 series, developed to refine the foundational capabilities established by GPT-4. Built to incorporate learnings and optimizations from the deployment of previous versions, it continues the evolution of OpenAI's flagship model line with enhanced reliability and performance.
1 year newer

Claude 3 Opus
Anthropic
2024-02-29

GPT-4.1
OpenAI
2025-04-14
Cost per million tokens (USD)

Claude 3 Opus

GPT-4.1
Context window and performance specifications
Average performance across 2 common benchmarks

Claude 3 Opus

GPT-4.1
GPT-4.1
2024-06-01
Available providers and their performance metrics

Claude 3 Opus
Anthropic
Bedrock

Claude 3 Opus

GPT-4.1

Claude 3 Opus

GPT-4.1

GPT-4.1
OpenAI