BoolQ
text
+
+
+
+
About
BoolQ is a question answering benchmark for yes/no questions containing 15,942 naturally occurring examples that test reading comprehension and natural language inference. These questions are unprompted and unconstrained, reflecting real-world information-seeking scenarios. The benchmark challenges models to perform binary classification on complex questions requiring deep text understanding, making it similar to existing natural language inference tasks but with practical, realistic queries.
+
+
+
+
Evaluation Stats
Total Models9
Organizations2
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
9 models
Top Score
84.8%
Average Score
81.0%
High Performers (80%+)
6Top Organizations
#1Microsoft
3 models
81.3%
#2Google
6 models
80.8%
+
+
+
+
Leaderboard
9 models ranked by performance on BoolQ
License | Links | ||||
---|---|---|---|---|---|
Jun 27, 2024 | Gemma | 84.8% | |||
Aug 23, 2024 | MIT | 84.6% | |||
Jun 27, 2024 | Gemma | 84.2% | |||
Jun 26, 2025 | Proprietary | 81.6% | |||
May 20, 2025 | Gemma | 81.6% | |||
Feb 1, 2025 | MIT | 81.2% | |||
Aug 23, 2024 | MIT | 78.0% | |||
Jun 26, 2025 | Proprietary | 76.4% | |||
May 20, 2025 | Gemma | 76.4% |