SuperGPQA

text
+
+
+
+
About

SuperGPQA is a comprehensive graduate-level benchmark evaluating knowledge and reasoning capabilities across 285 academic disciplines with 26,529 professional questions. Using Human-LLM collaborative filtering mechanisms, this rigorous evaluation reveals significant performance gaps in specialized fields, testing AI models' ability to demonstrate expert-level understanding and reasoning across diverse academic and professional domains.

+
+
+
+
Evaluation Stats
Total Models8
Organizations2
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
64.9%
Average Score
56.3%
High Performers (80%+)
0

Top Organizations

#1Alibaba Cloud / Qwen Team
5 models
58.2%
#2Moonshot AI
3 models
53.0%
+
+
+
+
Leaderboard
8 models ranked by performance on SuperGPQA
LicenseLinks
Jul 25, 2025
Apache 2.0
64.9%
Jul 22, 2025
Apache 2.0
62.6%
Sep 10, 2025
Apache 2.0
60.8%
Sep 10, 2025
Apache 2.0
58.8%
Sep 5, 2025
MIT
57.2%
Jul 11, 2025
MIT
57.2%
Jul 11, 2025
MIT
44.7%
Apr 29, 2025
Apache 2.0
44.1%
+
+
+
+
Resources