AGIEval
text
+
+
+
+
About
AGIEval is a human-centric AI benchmark that evaluates foundation models using real standardized exams including college entrance tests (SAT, Gaokao), law school admissions (LSAT), math competitions, and professional qualification exams. This bilingual benchmark assesses four core capabilities: understanding, knowledge, reasoning, and calculation. By testing AI against human-level performance on authentic exams rather than artificial datasets, AGIEval provides meaningful evaluation for Artificial General Intelligence development.
+
+
+
+
Evaluation Stats
Total Models5
Organizations3
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
5 models
Top Score
65.8%
Average Score
54.3%
High Performers (80%+)
0Top Organizations
#1Mistral AI
2 models
57.0%
#2Google
2 models
54.0%
#3IBM
1 model
49.3%
+
+
+
+
Leaderboard
5 models ranked by performance on AGIEval
License | Links | ||||
---|---|---|---|---|---|
Jan 30, 2025 | Apache 2.0 | 65.8% | |||
Jun 27, 2024 | Gemma | 55.1% | |||
Jun 27, 2024 | Gemma | 52.8% | |||
Apr 16, 2025 | Apache 2.0 | 49.3% | |||
Oct 16, 2024 | Mistral Research License | 48.3% |