Comprehensive side-by-side LLM comparison
Claude 3.7 Sonnet leads with 32.0% higher average benchmark score. Claude 3.7 Sonnet is available on 4 providers. Overall, Claude 3.7 Sonnet is the stronger choice for coding tasks.
Anthropic
Claude 3.7 Sonnet represents Anthropic's first hybrid reasoning model, capable of producing near-instant responses or extended step-by-step thinking that is visible to users. Developed with particularly strong improvements in coding and front-end web development, it allows users to control thinking budgets and balances real-world task performance with reasoning capabilities for enterprise applications.
Microsoft
Phi-3.5 Vision was developed as a multimodal variant of Phi-3.5, designed to understand and reason about both images and text. Built to extend the Phi family's efficiency into vision-language tasks, it enables compact multimodal AI for practical applications.
6 months newer

Phi-3.5-vision-instruct
Microsoft
2024-08-23

Claude 3.7 Sonnet
Anthropic
2025-02-24
Context window and performance specifications
Average performance across 1 common benchmarks

Claude 3.7 Sonnet

Phi-3.5-vision-instruct
Available providers and their performance metrics

Claude 3.7 Sonnet
Anthropic
Bedrock
ZeroEval

Claude 3.7 Sonnet

Phi-3.5-vision-instruct

Claude 3.7 Sonnet

Phi-3.5-vision-instruct

Phi-3.5-vision-instruct