SuperGPQA
text
+
+
+
+
About
SuperGPQA is a comprehensive graduate-level benchmark evaluating knowledge and reasoning capabilities across 285 academic disciplines with 26,529 professional questions. Using Human-LLM collaborative filtering mechanisms, this rigorous evaluation reveals significant performance gaps in specialized fields, testing AI models' ability to demonstrate expert-level understanding and reasoning across diverse academic and professional domains.
+
+
+
+
Evaluation Stats
Total Models8
Organizations2
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
8 models
Top Score
64.9%
Average Score
56.3%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
5 models
58.2%
#2Moonshot AI
3 models
53.0%
+
+
+
+
Leaderboard
8 models ranked by performance on SuperGPQA
License | Links | ||||
---|---|---|---|---|---|
Jul 25, 2025 | Apache 2.0 | 64.9% | |||
Jul 22, 2025 | Apache 2.0 | 62.6% | |||
Sep 10, 2025 | Apache 2.0 | 60.8% | |||
Sep 10, 2025 | Apache 2.0 | 58.8% | |||
Sep 5, 2025 | MIT | 57.2% | |||
Jul 11, 2025 | MIT | 57.2% | |||
Jul 11, 2025 | MIT | 44.7% | |||
Apr 29, 2025 | Apache 2.0 | 44.1% |