OpenBookQA

text

About

OpenBookQA is a question-answering benchmark modeled after open-book exams, requiring models to combine elementary science facts with common sense reasoning. Unlike traditional QA tasks, it provides a knowledge book but requires multi-step reasoning to connect facts and arrive at answers, testing both retrieval abilities and logical inference skills.

Evaluation Stats

Total Models4

Organizations2

Verified Results0

Self-Reported4

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

4 models

Top Score

89.6%

Average Score

77.2%

High Performers (80%+)

Top Organizations

#1Microsoft

3 models

82.7%

#2Mistral AI

1 model

60.6%

Leaderboard

4 models ranked by performance on OpenBookQA

			License
#01Phi-3.5-MoE-instruct	Microsoft	Aug 23, 2024	MIT	89.6%
#02Phi-3.5-mini-instruct	Microsoft	Aug 23, 2024	MIT	79.2%
#03Phi 4 Mini	Microsoft	Feb 1, 2025	MIT	79.2%
#04Mistral NeMo Instruct	Mistral AI	Jul 18, 2024	Apache 2.0	60.6%

Resources

Research Paper