Winogrande
text
+
+
+
+
About
WinoGrande is a large-scale commonsense reasoning benchmark featuring 44,000 pronoun resolution problems designed to challenge machine learning models beyond the original Winograd Schema Challenge. Using systematic bias reduction through the AfLite algorithm, this evaluation tests AI models' ability to understand commonsense relationships and resolve ambiguous pronouns requiring world knowledge and reasoning.
+
+
+
+
Evaluation Stats
Total Models19
Organizations8
Verified Results0
Self-Reported19
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
19 models
Top Score
87.5%
Average Score
77.0%
High Performers (80%+)
9Top Organizations
#1OpenAI
1 model
87.5%
#2Cohere
1 model
85.4%
#3NVIDIA
1 model
84.5%
#4Alibaba Cloud / Qwen Team
4 models
80.2%
#5Mistral AI
2 models
76.0%
+
+
+
+
Leaderboard
19 models ranked by performance on Winogrande
License | Links | ||||
---|---|---|---|---|---|
Jun 13, 2023 | Proprietary | 87.5% | |||
Aug 30, 2024 | CC BY-NC | 85.4% | |||
Jul 23, 2024 | tongyi-qianwen | 85.1% | |||
Oct 1, 2024 | Llama 3.1 Community License | 84.5% | |||
Jun 27, 2024 | Gemma | 83.7% | |||
Sep 19, 2024 | Apache 2.0 | 82.0% | |||
Aug 23, 2024 | MIT | 81.3% | |||
Sep 19, 2024 | Apache 2.0 | 80.8% | |||
Jun 27, 2024 | Gemma | 80.6% | |||
Jul 18, 2024 | Apache 2.0 | 76.8% |
Showing 1 to 10 of 19 models