AGIEval

text
+
+
+
+
About

AGIEval is a human-centric AI benchmark that evaluates foundation models using real standardized exams including college entrance tests (SAT, Gaokao), law school admissions (LSAT), math competitions, and professional qualification exams. This bilingual benchmark assesses four core capabilities: understanding, knowledge, reasoning, and calculation. By testing AI against human-level performance on authentic exams rather than artificial datasets, AGIEval provides meaningful evaluation for Artificial General Intelligence development.

+
+
+
+
Evaluation Stats
Total Models5
Organizations3
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

5 models
Top Score
65.8%
Average Score
54.3%
High Performers (80%+)
0

Top Organizations

#1Mistral AI
2 models
57.0%
#2Google
2 models
54.0%
#3IBM
1 model
49.3%
+
+
+
+
Leaderboard
5 models ranked by performance on AGIEval
LicenseLinks
Jan 30, 2025
Apache 2.0
65.8%
Jun 27, 2024
Gemma
55.1%
Jun 27, 2024
Gemma
52.8%
Apr 16, 2025
Apache 2.0
49.3%
Oct 16, 2024
Mistral Research License
48.3%