Qasper

text

About

QASPER is a question-answering benchmark specifically designed for scientific research papers, featuring 5,049 information-seeking questions over 1,585 Natural Language Processing papers. This comprehensive evaluation tests AI models' ability to understand complex scientific content, extract relevant information from lengthy academic texts, and provide accurate answers to domain-specific research questions requiring deep comprehension.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

41.9%

Average Score

40.9%

High Performers (80%+)

Top Organizations

#1Microsoft

2 models

40.9%

Leaderboard

2 models ranked by performance on Qasper

			License		Links
#01Phi-3.5-mini-instruct	Microsoft	Aug 23, 2024	MIT	41.9%
#02Phi-3.5-MoE-instruct	Microsoft	Aug 23, 2024	MIT	40.0%

Resources

Research Paper