PIQA

text
+
+
+
+
About

PIQA (Physical Interaction Question Answering) is a benchmark for physical commonsense reasoning that tests AI models' understanding of everyday physical interactions through multiple-choice questions. Drawing inspiration from instructables.com, this benchmark focuses on atypical but practical solutions to physical problems, evaluating models' grasp of real-world physics and common-sense reasoning in physical scenarios.

+
+
+
+
Evaluation Stats
Total Models9
Organizations2
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

9 models
Top Score
88.6%
Average Score
81.3%
High Performers (80%+)
6

Top Organizations

#1Microsoft
3 models
82.4%
#2Google
6 models
80.8%
+
+
+
+
Leaderboard
9 models ranked by performance on PIQA
LicenseLinks
Aug 23, 2024
MIT
88.6%
Jun 27, 2024
Gemma
83.2%
Jun 27, 2024
Gemma
81.7%
Jun 26, 2025
Proprietary
81.0%
May 20, 2025
Gemma
81.0%
Aug 23, 2024
MIT
81.0%
May 20, 2025
Gemma
78.9%
Jun 26, 2025
Proprietary
78.9%
Feb 1, 2025
MIT
77.6%
+
+
+
+
Resources