Include
text
+
+
+
+
About
Include is a specialized benchmark designed to evaluate AI models' ability to incorporate specific elements, requirements, or constraints into their outputs. This benchmark tests models' capacity to follow inclusion instructions, maintain required components, and ensure comprehensive coverage of specified elements while generating responses that meet explicit inclusion criteria.
+
+
+
+
Evaluation Stats
Total Models9
Organizations2
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
9 models
Top Score
81.0%
Average Score
64.8%
High Performers (80%+)
1Top Organizations
#1Alibaba Cloud / Qwen Team
5 models
78.4%
#2Google
4 models
47.9%
+
+
+
+
Leaderboard
9 models ranked by performance on Include
License | Links | ||||
---|---|---|---|---|---|
Jul 25, 2025 | Apache 2.0 | 81.0% | |||
Jul 22, 2025 | Apache 2.0 | 79.5% | |||
Sep 10, 2025 | Apache 2.0 | 78.9% | |||
Sep 10, 2025 | Apache 2.0 | 78.9% | |||
Apr 29, 2025 | Apache 2.0 | 73.5% | |||
May 20, 2025 | Gemma | 57.2% | |||
Jun 26, 2025 | Proprietary | 57.2% | |||
Jun 26, 2025 | Proprietary | 38.6% | |||
May 20, 2025 | Gemma | 38.6% |