Winogrande

text
+
+
+
+
About

WinoGrande is a large-scale commonsense reasoning benchmark featuring 44,000 pronoun resolution problems designed to challenge machine learning models beyond the original Winograd Schema Challenge. Using systematic bias reduction through the AfLite algorithm, this evaluation tests AI models' ability to understand commonsense relationships and resolve ambiguous pronouns requiring world knowledge and reasoning.

+
+
+
+
Evaluation Stats
Total Models19
Organizations8
Verified Results0
Self-Reported19
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

19 models
Top Score
87.5%
Average Score
77.0%
High Performers (80%+)
9

Top Organizations

#1OpenAI
1 model
87.5%
#2Cohere
1 model
85.4%
#3NVIDIA
1 model
84.5%
#4Alibaba Cloud / Qwen Team
4 models
80.2%
#5Mistral AI
2 models
76.0%
+
+
+
+
Leaderboard
19 models ranked by performance on Winogrande
LicenseLinks
Jun 13, 2023
Proprietary
87.5%
Aug 30, 2024
CC BY-NC
85.4%
Jul 23, 2024
tongyi-qianwen
85.1%
Oct 1, 2024
Llama 3.1 Community License
84.5%
Jun 27, 2024
Gemma
83.7%
Sep 19, 2024
Apache 2.0
82.0%
Aug 23, 2024
MIT
81.3%
Sep 19, 2024
Apache 2.0
80.8%
Jun 27, 2024
Gemma
80.6%
Jul 18, 2024
Apache 2.0
76.8%
Showing 1 to 10 of 19 models
+
+
+
+
Resources