Include

text

About

Include is a specialized benchmark designed to evaluate AI models' ability to incorporate specific elements, requirements, or constraints into their outputs. This benchmark tests models' capacity to follow inclusion instructions, maintain required components, and ensure comprehensive coverage of specified elements while generating responses that meet explicit inclusion criteria.

Evaluation Stats

Total Models9

Organizations2

Verified Results0

Self-Reported9

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

9 models

Top Score

81.0%

Average Score

64.8%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

5 models

78.4%

#2Google

4 models

47.9%

Leaderboard

9 models ranked by performance on Include

			License
#01Qwen3-235B-A22B-Thinking-2507	Alibaba Cloud / Qwen Team	Jul 25, 2025	Apache 2.0	81.0%
#02Qwen3-235B-A22B-Instruct-2507	Alibaba Cloud / Qwen Team	Jul 22, 2025	Apache 2.0	79.5%
#03Qwen3-Next-80B-A3B-Thinking	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	78.9%
#04Qwen3-Next-80B-A3B-Instruct	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	78.9%
#05Qwen3 235B A22B	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	73.5%
#06Gemma 3n E4B Instructed LiteRT Preview	Google	May 20, 2025	Gemma	57.2%
#07Gemma 3n E4B Instructed	Google	Jun 26, 2025	Proprietary	57.2%
#08Gemma 3n E2B Instructed	Google	Jun 26, 2025	Proprietary	38.6%
#09Gemma 3n E2B Instructed LiteRT (Preview)	Google	May 20, 2025	Gemma	38.6%