IFEval

text
+
+
+
+
About

IFEval (Instruction-Following Evaluation) is a straightforward benchmark that evaluates Large Language Models' ability to follow verifiable instructions such as word count requirements, formatting constraints, and specific content guidelines. This benchmark focuses on measurable instruction adherence through objective criteria, testing models' capability to understand and comply with explicit directives and constraints.

+
+
+
+
Evaluation Stats
Total Models41
Organizations12
Verified Results0
Self-Reported41
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

41 models
Top Score
93.9%
Average Score
83.9%
High Performers (80%+)
32

Top Organizations

#1Anthropic
1 model
93.2%
#2Amazon
3 models
89.7%
#3Moonshot AI
3 models
88.9%
#4Google
4 models
87.4%
#5NVIDIA
3 models
86.4%
+
+
+
+
Leaderboard
41 models ranked by performance on IFEval
LicenseLinks
Jan 30, 2025
Proprietary
93.9%
Feb 24, 2025
Proprietary
93.2%
Nov 20, 2024
Proprietary
92.1%
Dec 6, 2024
Llama 3.3 Community License Agreement
92.1%
Mar 12, 2025
Gemma
90.4%
Aug 18, 2025
NVIDIA Open Model License Agreement
90.3%
Mar 12, 2025
Gemma
90.2%
Jul 11, 2025
MIT
89.8%
Sep 5, 2025
MIT
89.8%
Nov 20, 2024
Proprietary
89.7%
Showing 1 to 10 of 41 models
...
+
+
+
+
Resources