C-Eval

Multilingual

text

About

C-Eval is the first comprehensive Chinese evaluation suite for foundation models, featuring 13,948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. This benchmark assesses advanced knowledge and reasoning abilities of language models in Chinese contexts, covering subjects from elementary to professional levels. C-Eval serves as a critical evaluation tool for measuring LLM performance in Chinese language understanding and domain-specific knowledge.

Evaluation Stats

Total Models5

Organizations3

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

92.5%

Average Score

85.7%

High Performers (80%+)

Top Organizations

#1Moonshot AI

2 models

90.4%

#2DeepSeek

1 model

86.5%

#3Alibaba Cloud / Qwen Team

2 models

80.5%

Leaderboard

5 models ranked by performance on C-Eval

			License
#01Kimi K2 Base	Moonshot AI	Jul 11, 2025	MIT	92.5%
#02Kimi-k1.5	Moonshot AI	Jan 20, 2025	Proprietary	88.3%
#03DeepSeek-V3	DeepSeek	Dec 25, 2024	MIT + Model License (Commercial use allowed)	86.5%
#04Qwen2 72B Instruct	Alibaba Cloud / Qwen Team	Jul 23, 2024	tongyi-qianwen	83.8%
#05Qwen2 7B Instruct	Alibaba Cloud / Qwen Team	Jul 23, 2024	Apache 2.0	77.2%

Resources

Research Paper