AutoLogi

Multilingual

text

About

AutoLogi is a bilingual benchmark featuring automatically generated logic puzzles for evaluating Large Language Models' reasoning abilities. Using program-based verification and controllable difficulty levels, it creates open-ended logic puzzles that test systematic reasoning rather than pattern matching. The benchmark provides reliable assessment of logical thinking capabilities through synthesized puzzles with verified solutions, offering more robust evaluation than traditional multiple-choice formats.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

89.5%

Average Score

89.5%

High Performers (80%+)

Top Organizations

#1Moonshot AI

2 models

89.5%

Leaderboard

2 models ranked by performance on AutoLogi

			License		Links
#01Kimi K2 Instruct	Moonshot AI	Jul 11, 2025	MIT	89.5%
#02Kimi K2-Instruct-0905	Moonshot AI	Sep 5, 2025	MIT	89.5%

Resources

Research Paper