ZebraLogic
text
+
+
+
+
About
ZebraLogic is a comprehensive logical reasoning benchmark featuring 1,000 logic grid puzzles (Zebra puzzles) ranging from 2x2 to 6x6 complexity levels. This evaluation tests large language models' ability to solve constraint satisfaction problems by deducing unique value assignments based on logical clues, measuring both puzzle-level and cell-wise accuracy across easy and hard reasoning challenges.
+
+
+
+
Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
95.0%
Average Score
91.0%
High Performers (80%+)
3Top Organizations
#1Alibaba Cloud / Qwen Team
1 model
95.0%
#2Moonshot AI
2 models
89.0%
+
+
+
+
Leaderboard
3 models ranked by performance on ZebraLogic
License | Links | ||||
---|---|---|---|---|---|
Jul 22, 2025 | Apache 2.0 | 95.0% | |||
Jul 11, 2025 | MIT | 89.0% | |||
Sep 5, 2025 | MIT | 89.0% |