Comprehensive side-by-side LLM comparison
Grok-4 Heavy leads with 12.6% higher average benchmark score. Grok-4 Heavy supports multimodal inputs. Kimi K2 0905 is available on 2 providers. Overall, Grok-4 Heavy is the stronger choice for coding tasks.
xAI
Grok 4 Heavy was created as a larger, more capable variant of Grok 4, designed to provide maximum reasoning depth and analytical capability. Built for the most demanding applications requiring sophisticated problem-solving, it represents xAI's flagship offering in the fourth generation.
Moonshot AI
Kimi K2 was introduced as the second generation of Moonshot's language model family, designed to provide enhanced capabilities across language understanding and generation. Built with architectural improvements and expanded training, it represents a significant advancement in Moonshot's model offerings.
1 month newer

Grok-4 Heavy
xAI
2025-07-09

Kimi K2 0905
Moonshot AI
2025-09-05
Context window and performance specifications
Average performance across 1 common benchmarks

Grok-4 Heavy

Kimi K2 0905
Grok-4 Heavy
2024-12-31
Available providers and their performance metrics

Grok-4 Heavy

Kimi K2 0905
Novita

Grok-4 Heavy

Kimi K2 0905

Grok-4 Heavy

Kimi K2 0905
ZeroEval