SQuALITY

text

About

SQuALITY is a question-focused summarization benchmark featuring 100 Project Gutenberg short stories with 500 questions and 2,000 high-quality summaries created by trained writers. This comprehensive evaluation tests AI models' ability to generate targeted summaries that answer specific questions about long documents, requiring sophisticated reading comprehension and selective information extraction capabilities.

Evaluation Stats

Total Models5

Organizations2

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

24.3%

Average Score

21.2%

High Performers (80%+)

Top Organizations

#1Microsoft

2 models

24.2%

#2Amazon

3 models

19.3%

Leaderboard

5 models ranked by performance on SQuALITY

			License
#01Phi-3.5-mini-instruct	Microsoft	Aug 23, 2024	MIT	24.3%
#02Phi-3.5-MoE-instruct	Microsoft	Aug 23, 2024	MIT	24.1%
#03Nova Pro	Amazon	Nov 20, 2024	Proprietary	19.8%
#04Nova Lite	Amazon	Nov 20, 2024	Proprietary	19.2%
#05Nova Micro	Amazon	Nov 20, 2024	Proprietary	18.8%

Resources

Research Paper