BFCL-v3
text
+
+
+
+
About
BFCL v3 introduces advanced evaluation methodologies for function calling assessment, featuring sophisticated test scenarios and improved accuracy measurements. This version incorporates Abstract Syntax Tree (AST) evaluation for more precise function call validation across multiple programming languages. BFCL v3 provides enhanced assessment of both serial and parallel function calls, offering more granular insights into LLM tool usage capabilities and error patterns.
+
+
+
+
Evaluation Stats
Total Models6
Organizations2
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
77.8%
Average Score
73.2%
High Performers (80%+)
0Top Organizations
#1Zhipu AI
2 models
77.1%
#2Alibaba Cloud / Qwen Team
4 models
71.3%
+
+
+
+
Leaderboard
6 models ranked by performance on BFCL-v3
License | Links | ||||
---|---|---|---|---|---|
Jul 28, 2025 | MIT | 77.8% | |||
Jul 28, 2025 | MIT | 76.4% | |||
Sep 10, 2025 | Apache 2.0 | 72.0% | |||
Jul 25, 2025 | Apache 2.0 | 71.9% | |||
Jul 22, 2025 | Apache 2.0 | 70.9% | |||
Sep 10, 2025 | Apache 2.0 | 70.3% |