Multi-IF

Multilingual
text
+
+
+
+
About

Multi-IF is a multilingual instruction-following benchmark designed to evaluate language models' ability to follow complex instructions across multiple languages. It tests models' instruction comprehension, multilingual understanding, and task execution capabilities in diverse linguistic contexts, providing comprehensive assessment of cross-lingual instruction-following performance.

+
+
+
+
Evaluation Stats
Total Models11
Organizations2
Verified Results0
Self-Reported11
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

11 models
Top Score
80.6%
Average Score
71.8%
High Performers (80%+)
1

Top Organizations

#1Alibaba Cloud / Qwen Team
5 models
76.8%
#2OpenAI
6 models
67.7%
+
+
+
+
Leaderboard
11 models ranked by performance on Multi-IF
LicenseLinks
Jul 25, 2025
Apache 2.0
80.6%
Jan 30, 2025
Proprietary
79.5%
Sep 10, 2025
Apache 2.0
77.8%
Jul 22, 2025
Apache 2.0
77.5%
Sep 10, 2025
Apache 2.0
75.8%
Apr 29, 2025
Apache 2.0
72.2%
Feb 27, 2025
Proprietary
70.8%
Apr 14, 2025
Proprietary
70.8%
Apr 14, 2025
Proprietary
67.0%
Aug 6, 2024
Proprietary
60.9%
Showing 1 to 10 of 11 models
+
+
+
+
Resources