COLLIE

text
+
+
+
+
About

COLLIE is a systematic framework for evaluating constrained text generation capabilities in Large Language Models. The benchmark tests models' ability to generate text under various compositional constraints with diverse generation levels and modeling challenges. COLLIE provides comprehensive assessment of controlled text generation through constraint structure specification, example extraction, instruction rendering, and rigorous evaluation against specified constraints.

+
+
+
+
Evaluation Stats
Total Models8
Organizations1
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

8 models
Top Score
99.0%
Average Score
74.0%
High Performers (80%+)
3

Top Organizations

#1OpenAI
8 models
74.0%
+
+
+
+
Leaderboard
8 models ranked by performance on COLLIE
LicenseLinks
Aug 7, 2025
Proprietary
99.0%
Jan 30, 2025
Proprietary
98.7%
Apr 16, 2025
Proprietary
98.4%
Feb 27, 2025
Proprietary
72.3%
Apr 14, 2025
Proprietary
65.8%
Aug 6, 2024
Proprietary
61.0%
Apr 14, 2025
Proprietary
54.6%
Apr 14, 2025
Proprietary
42.5%
+
+
+
+
Resources