
Kimi K2 Instruct
Zero-eval
#1GSM8k
#1CBNSL
#1AutoLogi
+18 more
by Moonshot AI
+
+
+
+
About
Kimi K2 Instruct is a language model developed by Moonshot AI. It achieves strong performance with an average score of 66.7% across 38 benchmarks. It excels particularly in MATH-500 (97.4%), GSM8k (97.3%), CBNSL (95.6%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Moonshot AI's latest advancement in AI technology.
+
+
+
+
Pricing Range
Input (per 1M)$0.57 -$0.57
Output (per 1M)$2.30 -$2.30
Providers1
+
+
+
+
Timeline
AnnouncedJul 11, 2025
ReleasedJul 11, 2025
+
+
+
+
Specifications
Training Tokens15.5T
+
+
+
+
License & Family
License
MIT
Base ModelKimi K2 Base
Performance Overview
Performance metrics and category breakdown
Overall Performance
38 benchmarks
Average Score
66.7%
Best Score
97.4%
High Performers (80%+)
12Performance Metrics
Max Context Window
262.1KAvg Throughput
45.0 tok/sAvg Latency
1ms+
+
+
+
All Benchmark Results for Kimi K2 Instruct
Complete list of benchmark scores with detailed information
MATH-500 | text | 0.97 | 97.4% | Self-reported | |
GSM8k | text | 0.97 | 97.3% | Self-reported | |
CBNSL | text | 0.96 | 95.6% | Self-reported | |
HumanEval | text | 0.93 | 93.3% | Self-reported | |
MMLU-Redux | text | 0.93 | 92.7% | Self-reported | |
IFEval | text | 0.90 | 89.8% | Self-reported | |
MMLU | text | 0.90 | 89.5% | Self-reported | |
AutoLogi | text | 0.90 | 89.5% | Self-reported | |
ZebraLogic | text | 0.89 | 89.0% | Self-reported | |
MultiPL-E | text | 0.86 | 85.7% | Self-reported |
Showing 1 to 10 of 38 benchmarks