BFCL-v3

text

About

BFCL v3 introduces advanced evaluation methodologies for function calling assessment, featuring sophisticated test scenarios and improved accuracy measurements. This version incorporates Abstract Syntax Tree (AST) evaluation for more precise function call validation across multiple programming languages. BFCL v3 provides enhanced assessment of both serial and parallel function calls, offering more granular insights into LLM tool usage capabilities and error patterns.

Evaluation Stats

Total Models6

Organizations2

Verified Results0

Self-Reported6

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

6 models

Top Score

77.8%

Average Score

73.2%

High Performers (80%+)

Top Organizations

#1Zhipu AI

2 models

77.1%

#2Alibaba Cloud / Qwen Team

4 models

71.3%

Leaderboard

6 models ranked by performance on BFCL-v3

			License
#01GLM-4.5	Zhipu AI	Jul 28, 2025	MIT	77.8%
#02GLM-4.5-Air	Zhipu AI	Jul 28, 2025	MIT	76.4%
#03Qwen3-Next-80B-A3B-Thinking	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	72.0%
#04Qwen3-235B-A22B-Thinking-2507	Alibaba Cloud / Qwen Team	Jul 25, 2025	Apache 2.0	71.9%
#05Qwen3-235B-A22B-Instruct-2507	Alibaba Cloud / Qwen Team	Jul 22, 2025	Apache 2.0	70.9%
#06Qwen3-Next-80B-A3B-Instruct	Alibaba Cloud / Qwen Team	Sep 10, 2025	Apache 2.0	70.3%

Resources

Research Paper