AutoLogi
Multilingual
text
+
+
+
+
About
AutoLogi is a bilingual benchmark featuring automatically generated logic puzzles for evaluating Large Language Models' reasoning abilities. Using program-based verification and controllable difficulty levels, it creates open-ended logic puzzles that test systematic reasoning rather than pattern matching. The benchmark provides reliable assessment of logical thinking capabilities through synthesized puzzles with verified solutions, offering more robust evaluation than traditional multiple-choice formats.
+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
2 models
Top Score
89.5%
Average Score
89.5%
High Performers (80%+)
2Top Organizations
#1Moonshot AI
2 models
89.5%
+
+
+
+
Leaderboard
2 models ranked by performance on AutoLogi
License | Links | ||||
---|---|---|---|---|---|
Jul 11, 2025 | MIT | 89.5% | |||
Sep 5, 2025 | MIT | 89.5% |