C-Eval

Multilingual
text
+
+
+
+
About

C-Eval is the first comprehensive Chinese evaluation suite for foundation models, featuring 13,948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. This benchmark assesses advanced knowledge and reasoning abilities of language models in Chinese contexts, covering subjects from elementary to professional levels. C-Eval serves as a critical evaluation tool for measuring LLM performance in Chinese language understanding and domain-specific knowledge.

+
+
+
+
Evaluation Stats
Total Models5
Organizations3
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

5 models
Top Score
92.5%
Average Score
85.7%
High Performers (80%+)
4

Top Organizations

#1Moonshot AI
2 models
90.4%
#2DeepSeek
1 model
86.5%
#3Alibaba Cloud / Qwen Team
2 models
80.5%
+
+
+
+
Leaderboard
5 models ranked by performance on C-Eval
LicenseLinks
Jul 11, 2025
MIT
92.5%
Jan 20, 2025
Proprietary
88.3%
Dec 25, 2024
MIT + Model License (Commercial use allowed)
86.5%
Jul 23, 2024
tongyi-qianwen
83.8%
Jul 23, 2024
Apache 2.0
77.2%
+
+
+
+
Resources