DROP

text

About

DROP (Discrete Reasoning Over Paragraphs) is a reading comprehension benchmark featuring 96,000 question-answer pairs over 6,700 paragraphs that require discrete reasoning operations. Created by Allen AI, this challenging dataset tests AI models' ability to perform multi-step reasoning, numerical calculations, and logical operations over text content. DROP evaluates complex reading comprehension beyond simple text retrieval, requiring mathematical reasoning and discrete computational skills.

Evaluation Stats

Total Models27

Organizations8

Verified Results0

Self-Reported26

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

27 models

Top Score

91.6%

Average Score

73.3%

High Performers (80%+)

Top Organizations

#1DeepSeek

1 model

91.6%

#2Anthropic

6 models

82.9%

#3Amazon

3 models

81.6%

#4OpenAI

5 models

80.0%

#5Microsoft

1 model

75.5%

Leaderboard

27 models ranked by performance on DROP

			License
#01DeepSeek-V3	DeepSeek	Dec 25, 2024	MIT + Model License (Commercial use allowed)	91.6%
#02Claude 3.5 Sonnet	Anthropic	Oct 22, 2024	Proprietary	87.1%
#03Claude 3.5 Sonnet	Anthropic	Jun 21, 2024	Proprietary	87.1%
#04GPT-4 Turbo	OpenAI	Apr 9, 2024	Proprietary	86.0%
#05Nova Pro	Amazon	Nov 20, 2024	Proprietary	85.4%
#06Llama 3.1 405B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	84.8%
#07GPT-4o	OpenAI	May 13, 2024	Proprietary	83.4%
#08Claude 3 Opus	Anthropic	Feb 29, 2024	Proprietary	83.1%
#09Claude 3.5 Haiku	Anthropic	Oct 22, 2024	Proprietary	83.1%
#10GPT-4	OpenAI	Jun 13, 2023	Proprietary	80.9%

Showing 1 to 10 of 27 models

Resources

Research Paper