BFCL v2

Multilingual

text

About

BFCL v2 (Berkeley Function Calling Leaderboard v2) is an enhanced version featuring improved evaluation criteria and expanded test coverage for function calling capabilities. Building on the original BFCL framework, it incorporates refined metrics for accuracy assessment, enhanced multi-turn scenarios, and additional real-world function calling challenges. The benchmark maintains focus on tool usage evaluation while providing more comprehensive assessment of LLM function calling abilities.

Evaluation Stats

Total Models5

Organizations2

Verified Results0

Self-Reported5

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

5 models

Top Score

77.3%

Average Score

71.1%

High Performers (80%+)

Top Organizations

#1Meta

2 models

72.2%

#2NVIDIA

3 models

70.5%

Leaderboard

5 models ranked by performance on BFCL v2

			License
#01Llama 3.3 70B Instruct	Meta	Dec 6, 2024	Llama 3.3 Community License Agreement	77.3%
#02Llama 3.1 Nemotron Ultra 253B v1	NVIDIA	Apr 7, 2025	Llama 3.1 Community License	74.1%
#03Llama-3.3 Nemotron Super 49B v1	NVIDIA	Mar 18, 2025	Llama 3.1 Community License	73.7%
#04Llama 3.2 3B Instruct	Meta	Sep 25, 2024	Llama 3.2 Community License	67.0%
#05Llama 3.1 Nemotron Nano 8B V1	NVIDIA	Mar 18, 2025	Llama 3.1 Community License	63.6%

Resources

Research Paper