SimpleQA
text
+
+
+
+
About
SimpleQA is OpenAI's factuality benchmark designed to measure language models' ability to answer short, fact-seeking questions with high correctness and low variance. This comprehensive evaluation tests factual knowledge across diverse topics, challenging even frontier models and providing crucial insights into AI systems' reliability in providing accurate, verifiable information for straightforward factual queries.
+
+
+
+
Evaluation Stats
Total Models26
Organizations8
Verified Results0
Self-Reported26
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
26 models
Top Score
97.1%
Average Score
37.4%
High Performers (80%+)
4Top Organizations
#1xAI
1 model
95.0%
#2DeepSeek
4 models
76.9%
#3Alibaba Cloud / Qwen Team
1 model
54.3%
#4OpenAI
5 models
41.0%
#5Moonshot AI
3 models
32.4%
+
+
+
+
Leaderboard
26 models ranked by performance on SimpleQA
| License | Links | ||||
|---|---|---|---|---|---|
| Sep 29, 2025 | MIT | 97.1% | |||
| Aug 28, 2025 | Proprietary | 95.0% | |||
| Jan 10, 2025 | MIT | 93.4% | |||
| May 28, 2025 | MIT | 92.3% | |||
| Feb 27, 2025 | Proprietary | 62.5% | |||
| Jul 22, 2025 | Apache 2.0 | 54.3% | |||
| Jun 5, 2025 | Proprietary | 54.0% | |||
| May 20, 2025 | Proprietary | 50.8% | |||
| Dec 17, 2024 | Proprietary | 47.0% | |||
| Sep 12, 2024 | Proprietary | 42.4% |
Showing 1 to 10 of 26 models