Creative Writing v3

text

About

Creative Writing v3 is an LLM-judged benchmark that evaluates Large Language Models' creative writing capabilities using advanced rubric scoring and Elo rating systems. Judged by state-of-the-art models like Sonnet 4, it assesses writing quality across multiple dimensions including style, originality, coherence, and engagement. The benchmark measures AI systems' ability to produce compelling, creative content while avoiding repetition and maintaining high literary standards.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

87.5%

Average Score

86.3%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

3 models

86.3%

Leaderboard

3 models ranked by performance on Creative Writing v3

			License
#01Qwen3-235B-A22B-Instruct-2507	Alibaba Cloud / Qwen Team	Jul 22, 2025	Apache 2.0	87.5%
#02Qwen3-235B-A22B-Thinking-2507	Alibaba Cloud / Qwen Team	Jul 25, 2025	Apache 2.0	86.1%
#03Qwen3-Next-80B-A3B-Instruct	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	85.3%

Resources

Research Paper