IFEval
text
+
+
+
+
About
IFEval (Instruction-Following Evaluation) is a straightforward benchmark that evaluates Large Language Models' ability to follow verifiable instructions such as word count requirements, formatting constraints, and specific content guidelines. This benchmark focuses on measurable instruction adherence through objective criteria, testing models' capability to understand and comply with explicit directives and constraints.
+
+
+
+
Evaluation Stats
Total Models40
Organizations12
Verified Results0
Self-Reported40
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
40 models
Top Score
93.9%
Average Score
83.7%
High Performers (80%+)
31Top Organizations
#1Anthropic
1 model
93.2%
#2Amazon
3 models
89.7%
#3Moonshot AI
3 models
88.9%
#4Google
4 models
87.4%
#5DeepSeek
1 model
86.1%
+
+
+
+
Leaderboard
40 models ranked by performance on IFEval
License | Links | ||||
---|---|---|---|---|---|
Jan 30, 2025 | Proprietary | 93.9% | |||
Feb 24, 2025 | Proprietary | 93.2% | |||
Dec 6, 2024 | Llama 3.3 Community License Agreement | 92.1% | |||
Nov 20, 2024 | Proprietary | 92.1% | |||
Mar 12, 2025 | Gemma | 90.4% | |||
Mar 12, 2025 | Gemma | 90.2% | |||
Jul 11, 2025 | MIT | 89.8% | |||
Sep 5, 2025 | MIT | 89.8% | |||
Nov 20, 2024 | Proprietary | 89.7% | |||
Apr 7, 2025 | Llama 3.1 Community License | 89.5% |
Showing 1 to 10 of 40 models