IFEval

text
+
+
+
+
About

IFEval (Instruction-Following Evaluation) is a straightforward benchmark that evaluates Large Language Models' ability to follow verifiable instructions such as word count requirements, formatting constraints, and specific content guidelines. This benchmark focuses on measurable instruction adherence through objective criteria, testing models' capability to understand and comply with explicit directives and constraints.

+
+
+
+
Evaluation Stats
Total Models40
Organizations12
Verified Results0
Self-Reported40
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

40 models
Top Score
93.9%
Average Score
83.7%
High Performers (80%+)
31

Top Organizations

#1Anthropic
1 model
93.2%
#2Amazon
3 models
89.7%
#3Moonshot AI
3 models
88.9%
#4Google
4 models
87.4%
#5DeepSeek
1 model
86.1%
+
+
+
+
Leaderboard
40 models ranked by performance on IFEval
LicenseLinks
Jan 30, 2025
Proprietary
93.9%
Feb 24, 2025
Proprietary
93.2%
Dec 6, 2024
Llama 3.3 Community License Agreement
92.1%
Nov 20, 2024
Proprietary
92.1%
Mar 12, 2025
Gemma
90.4%
Mar 12, 2025
Gemma
90.2%
Jul 11, 2025
MIT
89.8%
Sep 5, 2025
MIT
89.8%
Nov 20, 2024
Proprietary
89.7%
Apr 7, 2025
Llama 3.1 Community License
89.5%
Showing 1 to 10 of 40 models
+
+
+
+
Resources