DROP
text
+
+
+
+
About
DROP (Discrete Reasoning Over Paragraphs) is a reading comprehension benchmark featuring 96,000 question-answer pairs over 6,700 paragraphs that require discrete reasoning operations. Created by Allen AI, this challenging dataset tests AI models' ability to perform multi-step reasoning, numerical calculations, and logical operations over text content. DROP evaluates complex reading comprehension beyond simple text retrieval, requiring mathematical reasoning and discrete computational skills.
+
+
+
+
Evaluation Stats
Total Models27
Organizations8
Verified Results0
Self-Reported26
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
27 models
Top Score
91.6%
Average Score
73.3%
High Performers (80%+)
11Top Organizations
#1DeepSeek
1 model
91.6%
#2Anthropic
6 models
82.9%
#3Amazon
3 models
81.6%
#4OpenAI
5 models
80.0%
#5Microsoft
1 model
75.5%
+
+
+
+
Leaderboard
27 models ranked by performance on DROP
License | Links | ||||
---|---|---|---|---|---|
Dec 25, 2024 | MIT + Model License (Commercial use allowed) | 91.6% | |||
Oct 22, 2024 | Proprietary | 87.1% | |||
Jun 21, 2024 | Proprietary | 87.1% | |||
Apr 9, 2024 | Proprietary | 86.0% | |||
Nov 20, 2024 | Proprietary | 85.4% | |||
Jul 23, 2024 | Llama 3.1 Community License | 84.8% | |||
May 13, 2024 | Proprietary | 83.4% | |||
Feb 29, 2024 | Proprietary | 83.1% | |||
Oct 22, 2024 | Proprietary | 83.1% | |||
Jun 13, 2023 | Proprietary | 80.9% |
Showing 1 to 10 of 27 models