Comprehensive side-by-side LLM comparison
GPT-4.1 leads with 20.7% higher average benchmark score. Overall, GPT-4.1 is the stronger choice for coding tasks.
OpenAI
GPT-4.1 represents an iterative improvement in the GPT-4 series, developed to refine the foundational capabilities established by GPT-4. Built to incorporate learnings and optimizations from the deployment of previous versions, it continues the evolution of OpenAI's flagship model line with enhanced reliability and performance.
Mistral AI
Mistral Small 24B Base was developed as a 24-billion-parameter foundation model, designed to serve as a base for fine-tuning and customization. Built to provide a strong starting point for domain-specific applications, it represents an intermediate-scale option in Mistral's model lineup.
2 months newer

Mistral Small 3 24B Base
Mistral AI
2025-01-30

GPT-4.1
OpenAI
2025-04-14
Context window and performance specifications
Average performance across 2 common benchmarks

GPT-4.1

Mistral Small 3 24B Base
Mistral Small 3 24B Base
2023-10-01
GPT-4.1
2024-06-01
Available providers and their performance metrics

GPT-4.1
OpenAI

Mistral Small 3 24B Base

GPT-4.1

Mistral Small 3 24B Base

GPT-4.1

Mistral Small 3 24B Base