ZebraLogic

text
+
+
+
+
About

ZebraLogic is a comprehensive logical reasoning benchmark featuring 1,000 logic grid puzzles (Zebra puzzles) ranging from 2x2 to 6x6 complexity levels. This evaluation tests large language models' ability to solve constraint satisfaction problems by deducing unique value assignments based on logical clues, measuring both puzzle-level and cell-wise accuracy across easy and hard reasoning challenges.

+
+
+
+
Evaluation Stats
Total Models3
Organizations2
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
95.0%
Average Score
91.0%
High Performers (80%+)
3

Top Organizations

#1Alibaba Cloud / Qwen Team
1 model
95.0%
#2Moonshot AI
2 models
89.0%
+
+
+
+
Leaderboard
3 models ranked by performance on ZebraLogic
LicenseLinks
Jul 22, 2025
Apache 2.0
95.0%
Jul 11, 2025
MIT
89.0%
Sep 5, 2025
MIT
89.0%
+
+
+
+
Resources