DROP

text
+
+
+
+
About

DROP (Discrete Reasoning Over Paragraphs) is a reading comprehension benchmark featuring 96,000 question-answer pairs over 6,700 paragraphs that require discrete reasoning operations. Created by Allen AI, this challenging dataset tests AI models' ability to perform multi-step reasoning, numerical calculations, and logical operations over text content. DROP evaluates complex reading comprehension beyond simple text retrieval, requiring mathematical reasoning and discrete computational skills.

+
+
+
+
Evaluation Stats
Total Models27
Organizations8
Verified Results0
Self-Reported26
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

27 models
Top Score
91.6%
Average Score
73.3%
High Performers (80%+)
11

Top Organizations

#1DeepSeek
1 model
91.6%
#2Anthropic
6 models
82.9%
#3Amazon
3 models
81.6%
#4OpenAI
5 models
80.0%
#5Microsoft
1 model
75.5%
+
+
+
+
Leaderboard
27 models ranked by performance on DROP
LicenseLinks
Dec 25, 2024
MIT + Model License (Commercial use allowed)
91.6%
Oct 22, 2024
Proprietary
87.1%
Jun 21, 2024
Proprietary
87.1%
Apr 9, 2024
Proprietary
86.0%
Nov 20, 2024
Proprietary
85.4%
Jul 23, 2024
Llama 3.1 Community License
84.8%
May 13, 2024
Proprietary
83.4%
Feb 29, 2024
Proprietary
83.1%
Oct 22, 2024
Proprietary
83.1%
Jun 13, 2023
Proprietary
80.9%
Showing 1 to 10 of 27 models
+
+
+
+
Resources