AlignBench

Multilingual
text
+
+
+
+
About

AlignBench is a comprehensive multi-dimensional benchmark for evaluating Large Language Model alignment, particularly in Chinese. It assesses how well AI systems interpret, internalize, and execute human instructions through 683 real-scenario queries across 8 categories. The benchmark evaluates literal, implicit, and intentional compliance, complex conflict resolution, multi-turn dialogue consistency, and adaptive reinterpretation capabilities in scenarios with profound ambiguity.

+
+
+
+
Evaluation Stats
Total Models4
Organizations2
Verified Results0
Self-Reported4
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

4 models
Top Score
81.6%
Average Score
76.9%
High Performers (80%+)
2

Top Organizations

#1DeepSeek
1 model
80.4%
#2Alibaba Cloud / Qwen Team
3 models
75.7%
+
+
+
+
Leaderboard
4 models ranked by performance on AlignBench
LicenseLinks
Sep 19, 2024
Qwen
81.6%
May 8, 2024
deepseek
80.4%
Sep 19, 2024
Apache 2.0
73.3%
Jul 23, 2024
Apache 2.0
72.1%
+
+
+
+
Resources