Multi-IF
Multilingual
text
+
+
+
+
About
Multi-IF is a multilingual instruction-following benchmark designed to evaluate language models' ability to follow complex instructions across multiple languages. It tests models' instruction comprehension, multilingual understanding, and task execution capabilities in diverse linguistic contexts, providing comprehensive assessment of cross-lingual instruction-following performance.
+
+
+
+
Evaluation Stats
Total Models11
Organizations2
Verified Results0
Self-Reported11
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
11 models
Top Score
80.6%
Average Score
71.8%
High Performers (80%+)
1Top Organizations
#1Alibaba Cloud / Qwen Team
5 models
76.8%
#2OpenAI
6 models
67.7%
+
+
+
+
Leaderboard
11 models ranked by performance on Multi-IF
License | Links | ||||
---|---|---|---|---|---|
Jul 25, 2025 | Apache 2.0 | 80.6% | |||
Jan 30, 2025 | Proprietary | 79.5% | |||
Sep 10, 2025 | Apache 2.0 | 77.8% | |||
Jul 22, 2025 | Apache 2.0 | 77.5% | |||
Sep 10, 2025 | Apache 2.0 | 75.8% | |||
Apr 29, 2025 | Apache 2.0 | 72.2% | |||
Feb 27, 2025 | Proprietary | 70.8% | |||
Apr 14, 2025 | Proprietary | 70.8% | |||
Apr 14, 2025 | Proprietary | 67.0% | |||
Aug 6, 2024 | Proprietary | 60.9% |
Showing 1 to 10 of 11 models