C-Eval
Multilingual
text
+
+
+
+
About
C-Eval is the first comprehensive Chinese evaluation suite for foundation models, featuring 13,948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. This benchmark assesses advanced knowledge and reasoning abilities of language models in Chinese contexts, covering subjects from elementary to professional levels. C-Eval serves as a critical evaluation tool for measuring LLM performance in Chinese language understanding and domain-specific knowledge.
+
+
+
+
Evaluation Stats
Total Models5
Organizations3
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
5 models
Top Score
92.5%
Average Score
85.7%
High Performers (80%+)
4Top Organizations
#1Moonshot AI
2 models
90.4%
#2DeepSeek
1 model
86.5%
#3Alibaba Cloud / Qwen Team
2 models
80.5%
+
+
+
+
Leaderboard
5 models ranked by performance on C-Eval
License | Links | ||||
---|---|---|---|---|---|
Jul 11, 2025 | MIT | 92.5% | |||
Jan 20, 2025 | Proprietary | 88.3% | |||
Dec 25, 2024 | MIT + Model License (Commercial use allowed) | 86.5% | |||
Jul 23, 2024 | tongyi-qianwen | 83.8% | |||
Jul 23, 2024 | Apache 2.0 | 77.2% |