SimpleQA

text
+
+
+
+
About

SimpleQA is OpenAI's factuality benchmark designed to measure language models' ability to answer short, fact-seeking questions with high correctness and low variance. This comprehensive evaluation tests factual knowledge across diverse topics, challenging even frontier models and providing crucial insights into AI systems' reliability in providing accurate, verifiable information for straightforward factual queries.

+
+
+
+
Evaluation Stats
Total Models26
Organizations8
Verified Results0
Self-Reported26
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

26 models
Top Score
97.1%
Average Score
37.4%
High Performers (80%+)
4

Top Organizations

#1xAI
1 model
95.0%
#2DeepSeek
4 models
76.9%
#3Alibaba Cloud / Qwen Team
1 model
54.3%
#4OpenAI
5 models
41.0%
#5Moonshot AI
3 models
32.4%
+
+
+
+
Leaderboard
26 models ranked by performance on SimpleQA
LicenseLinks
Sep 29, 2025
MIT
97.1%
Aug 28, 2025
Proprietary
95.0%
Jan 10, 2025
MIT
93.4%
May 28, 2025
MIT
92.3%
Feb 27, 2025
Proprietary
62.5%
Jul 22, 2025
Apache 2.0
54.3%
Jun 5, 2025
Proprietary
54.0%
May 20, 2025
Proprietary
50.8%
Dec 17, 2024
Proprietary
47.0%
Sep 12, 2024
Proprietary
42.4%
Showing 1 to 10 of 26 models
+
+
+
+
Resources