PIQA
text
+
+
+
+
About
PIQA (Physical Interaction Question Answering) is a benchmark for physical commonsense reasoning that tests AI models' understanding of everyday physical interactions through multiple-choice questions. Drawing inspiration from instructables.com, this benchmark focuses on atypical but practical solutions to physical problems, evaluating models' grasp of real-world physics and common-sense reasoning in physical scenarios.
+
+
+
+
Evaluation Stats
Total Models9
Organizations2
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
9 models
Top Score
88.6%
Average Score
81.3%
High Performers (80%+)
6Top Organizations
#1Microsoft
3 models
82.4%
#2Google
6 models
80.8%
+
+
+
+
Leaderboard
9 models ranked by performance on PIQA
License | Links | ||||
---|---|---|---|---|---|
Aug 23, 2024 | MIT | 88.6% | |||
Jun 27, 2024 | Gemma | 83.2% | |||
Jun 27, 2024 | Gemma | 81.7% | |||
Jun 26, 2025 | Proprietary | 81.0% | |||
May 20, 2025 | Gemma | 81.0% | |||
Aug 23, 2024 | MIT | 81.0% | |||
May 20, 2025 | Gemma | 78.9% | |||
Jun 26, 2025 | Proprietary | 78.9% | |||
Feb 1, 2025 | MIT | 77.6% |