SuperGPQA

text

About

SuperGPQA is a comprehensive graduate-level benchmark evaluating knowledge and reasoning capabilities across 285 academic disciplines with 26,529 professional questions. Using Human-LLM collaborative filtering mechanisms, this rigorous evaluation reveals significant performance gaps in specialized fields, testing AI models' ability to demonstrate expert-level understanding and reasoning across diverse academic and professional domains.

Evaluation Stats

Total Models8

Organizations2

Verified Results0

Self-Reported8

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

8 models

Top Score

64.9%

Average Score

56.3%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

5 models

58.2%

#2Moonshot AI

3 models

53.0%

Leaderboard

8 models ranked by performance on SuperGPQA

			License
#01Qwen3-235B-A22B-Thinking-2507	Alibaba Cloud / Qwen Team	Jul 25, 2025	Apache 2.0	64.9%
#02Qwen3-235B-A22B-Instruct-2507	Alibaba Cloud / Qwen Team	Jul 22, 2025	Apache 2.0	62.6%
#03Qwen3-Next-80B-A3B-Thinking	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	60.8%
#04Qwen3-Next-80B-A3B-Instruct	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	58.8%
#05Kimi K2-Instruct-0905	Moonshot AI	Sep 5, 2025	MIT	57.2%
#06Kimi K2 Instruct	Moonshot AI	Jul 11, 2025	MIT	57.2%
#07Kimi K2 Base	Moonshot AI	Jul 11, 2025	MIT	44.7%
#08Qwen3 235B A22B	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	44.1%

Resources

Research Paper