BFCL-v3

text
+
+
+
+
About

BFCL v3 introduces advanced evaluation methodologies for function calling assessment, featuring sophisticated test scenarios and improved accuracy measurements. This version incorporates Abstract Syntax Tree (AST) evaluation for more precise function call validation across multiple programming languages. BFCL v3 provides enhanced assessment of both serial and parallel function calls, offering more granular insights into LLM tool usage capabilities and error patterns.

+
+
+
+
Evaluation Stats
Total Models6
Organizations2
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

6 models
Top Score
77.8%
Average Score
73.2%
High Performers (80%+)
0

Top Organizations

#1Zhipu AI
2 models
77.1%
#2Alibaba Cloud / Qwen Team
4 models
71.3%
+
+
+
+
Leaderboard
6 models ranked by performance on BFCL-v3
LicenseLinks
Jul 28, 2025
MIT
77.8%
Jul 28, 2025
MIT
76.4%
Sep 10, 2025
Apache 2.0
72.0%
Jul 25, 2025
Apache 2.0
71.9%
Jul 22, 2025
Apache 2.0
70.9%
Sep 10, 2025
Apache 2.0
70.3%
+
+
+
+
Resources