IFEval

text

About

IFEval (Instruction-Following Evaluation) is a straightforward benchmark that evaluates Large Language Models' ability to follow verifiable instructions such as word count requirements, formatting constraints, and specific content guidelines. This benchmark focuses on measurable instruction adherence through objective criteria, testing models' capability to understand and comply with explicit directives and constraints.

Evaluation Stats

Total Models41

Organizations12

Verified Results0

Self-Reported41

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

41 models

Top Score

93.9%

Average Score

83.9%

High Performers (80%+)

Top Organizations

#1Anthropic

1 model

93.2%

#2Amazon

3 models

89.7%

#3Moonshot AI

3 models

88.9%

#4Google

4 models

87.4%

#5NVIDIA

3 models

86.4%

Leaderboard

41 models ranked by performance on IFEval

			License
#01o3-mini	OpenAI	Jan 30, 2025	Proprietary	93.9%
#02Claude 3.7 Sonnet	Anthropic	Feb 24, 2025	Proprietary	93.2%
#03Nova Pro	Amazon	Nov 20, 2024	Proprietary	92.1%
#04Llama 3.3 70B Instruct	Meta	Dec 6, 2024	Llama 3.3 Community License Agreement	92.1%
#05Gemma 3 27B	Google	Mar 12, 2025	Gemma	90.4%
#06Nemotron Nano 9B v2	NVIDIA	Aug 18, 2025	NVIDIA Open Model License Agreement	90.3%
#07Gemma 3 4B	Google	Mar 12, 2025	Gemma	90.2%
#08Kimi K2 Instruct	Moonshot AI	Jul 11, 2025	MIT	89.8%
#09Kimi K2-Instruct-0905	Moonshot AI	Sep 5, 2025	MIT	89.8%
#10Nova Lite	Amazon	Nov 20, 2024	Proprietary	89.7%

Showing 1 to 10 of 41 models

...

Resources

Research Paper