SimpleQA

text
+
+
+
+
About

SimpleQA is OpenAI's factuality benchmark designed to measure language models' ability to answer short, fact-seeking questions with high correctness and low variance. This comprehensive evaluation tests factual knowledge across diverse topics, challenging even frontier models and providing crucial insights into AI systems' reliability in providing accurate, verifiable information for straightforward factual queries.

+
+
+
+
Evaluation Stats
Total Models25
Organizations7
Verified Results0
Self-Reported25
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

25 models
Top Score
97.1%
Average Score
35.1%
High Performers (80%+)
3

Top Organizations

#1DeepSeek
4 models
76.9%
#2Alibaba Cloud / Qwen Team
1 model
54.3%
#3OpenAI
5 models
41.0%
#4Moonshot AI
3 models
32.4%
#5Google
9 models
20.7%
+
+
+
+
Leaderboard
25 models ranked by performance on SimpleQA
LicenseLinks
Sep 29, 2025
MIT
97.1%
Jan 10, 2025
MIT
93.4%
May 28, 2025
MIT
92.3%
Feb 27, 2025
Proprietary
62.5%
Jul 22, 2025
Apache 2.0
54.3%
Jun 5, 2025
Proprietary
54.0%
May 20, 2025
Proprietary
50.8%
Dec 17, 2024
Proprietary
47.0%
Sep 12, 2024
Proprietary
42.4%
Aug 6, 2024
Proprietary
38.2%
Showing 1 to 10 of 25 models
+
+
+
+
Resources