COLLIE
text
+
+
+
+
About
COLLIE is a systematic framework for evaluating constrained text generation capabilities in Large Language Models. The benchmark tests models' ability to generate text under various compositional constraints with diverse generation levels and modeling challenges. COLLIE provides comprehensive assessment of controlled text generation through constraint structure specification, example extraction, instruction rendering, and rigorous evaluation against specified constraints.
+
+
+
+
Evaluation Stats
Total Models8
Organizations1
Verified Results0
Self-Reported8
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
8 models
Top Score
99.0%
Average Score
74.0%
High Performers (80%+)
3Top Organizations
#1OpenAI
8 models
74.0%
+
+
+
+
Leaderboard
8 models ranked by performance on COLLIE
License | Links | ||||
---|---|---|---|---|---|
Aug 7, 2025 | Proprietary | 99.0% | |||
Jan 30, 2025 | Proprietary | 98.7% | |||
Apr 16, 2025 | Proprietary | 98.4% | |||
Feb 27, 2025 | Proprietary | 72.3% | |||
Apr 14, 2025 | Proprietary | 65.8% | |||
Aug 6, 2024 | Proprietary | 61.0% | |||
Apr 14, 2025 | Proprietary | 54.6% | |||
Apr 14, 2025 | Proprietary | 42.5% |