SimpleQA
text
+
+
+
+
About
SimpleQA is OpenAI's factuality benchmark designed to measure language models' ability to answer short, fact-seeking questions with high correctness and low variance. This comprehensive evaluation tests factual knowledge across diverse topics, challenging even frontier models and providing crucial insights into AI systems' reliability in providing accurate, verifiable information for straightforward factual queries.
+
+
+
+
Evaluation Stats
Total Models25
Organizations7
Verified Results0
Self-Reported25
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
25 models
Top Score
97.1%
Average Score
35.1%
High Performers (80%+)
3Top Organizations
#1DeepSeek
4 models
76.9%
#2Alibaba Cloud / Qwen Team
1 model
54.3%
#3OpenAI
5 models
41.0%
#4Moonshot AI
3 models
32.4%
#5Google
9 models
20.7%
+
+
+
+
Leaderboard
25 models ranked by performance on SimpleQA
License | Links | ||||
---|---|---|---|---|---|
Sep 29, 2025 | MIT | 97.1% | |||
Jan 10, 2025 | MIT | 93.4% | |||
May 28, 2025 | MIT | 92.3% | |||
Feb 27, 2025 | Proprietary | 62.5% | |||
Jul 22, 2025 | Apache 2.0 | 54.3% | |||
Jun 5, 2025 | Proprietary | 54.0% | |||
May 20, 2025 | Proprietary | 50.8% | |||
Dec 17, 2024 | Proprietary | 47.0% | |||
Sep 12, 2024 | Proprietary | 42.4% | |||
Aug 6, 2024 | Proprietary | 38.2% |
Showing 1 to 10 of 25 models