IFEval
text
+
+
+
+
About
IFEval (Instruction-Following Evaluation) is a straightforward benchmark that evaluates Large Language Models' ability to follow verifiable instructions such as word count requirements, formatting constraints, and specific content guidelines. This benchmark focuses on measurable instruction adherence through objective criteria, testing models' capability to understand and comply with explicit directives and constraints.
+
+
+
+
Evaluation Stats
Total Models41
Organizations12
Verified Results0
Self-Reported41
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
41 models
Top Score
93.9%
Average Score
83.9%
High Performers (80%+)
32Top Organizations
#1Anthropic
1 model
93.2%
#2Amazon
3 models
89.7%
#3Moonshot AI
3 models
88.9%
#4Google
4 models
87.4%
#5NVIDIA
3 models
86.4%
+
+
+
+
Leaderboard
41 models ranked by performance on IFEval
| License | Links | ||||
|---|---|---|---|---|---|
| Jan 30, 2025 | Proprietary | 93.9% | |||
| Feb 24, 2025 | Proprietary | 93.2% | |||
| Nov 20, 2024 | Proprietary | 92.1% | |||
| Dec 6, 2024 | Llama 3.3 Community License Agreement | 92.1% | |||
| Mar 12, 2025 | Gemma | 90.4% | |||
| Aug 18, 2025 | NVIDIA Open Model License Agreement | 90.3% | |||
| Mar 12, 2025 | Gemma | 90.2% | |||
| Jul 11, 2025 | MIT | 89.8% | |||
| Sep 5, 2025 | MIT | 89.8% | |||
| Nov 20, 2024 | Proprietary | 89.7% |
Showing 1 to 10 of 41 models
...