Compare the top 5 language models by average benchmark performance
Note: This comparison shows the top 5 models ranked by average benchmark score. You can select specific models to compare in a future update.
| Feature | Grok-3 Mini | Mistral Large 2 | Grok-3 | Claude 3.5 Sonnet | Kimi K2 0905 |
|---|---|---|---|---|---|
Organization | xAI | Mistral AI | xAI | Anthropic | Moonshot AI |
Release Date | 2025-02-17 | 2024-07-24 | 2025-02-17 | 2024-06-21 | 2025-09-05 |
License | Proprietary | Mistral Research License | Proprietary | Proprietary | Proprietary |
Multimodal | |||||
Average Score | 87.8% | 87.6% | 85.7% | 84.1% | 84.0% |
| AIME 2024 | 95.8% | 93.3% | 72.0% | ||
| AIME 2025 | 90.8% | 93.3% | |||
| GPQA | 84.0% | 84.6% | 59.4% | 75.8% | |
| LiveCodeBench | 80.4% | 79.4% | |||
| GSM8k | 93.0% | 96.4% | |||
| HumanEval | 92.0% | 92.0% | 94.5% | ||
| MMLU | 84.0% | 90.4% | 90.2% | ||
| MMLU French | 82.8% | ||||
| MT-Bench | 86.3% | ||||
| MMMU | 78.0% | ||||
| BIG-Bench Hard | 93.1% | ||||
| DROP | 87.1% | ||||
| MATH | 71.1% | 89.1% | |||
| MGSM | 91.6% | ||||
| MMLU-Pro | 76.1% | 82.5% | |||
| xAI | |||||
| Mistral AI | |||||
| Bedrock | |||||
| Novita | |||||
| ZeroEval | |||||
| Min Input Price(cents per 1M tokens) | $0.30 | $2.00 | $3.00 | $3.00 | $0.60 |
| Min Output Price(cents per 1M tokens) | $0.50 | $6.00 | $15.00 | $15.00 | $2.50 |