BFCL v2

Multilingual
text
+
+
+
+
About

BFCL v2 (Berkeley Function Calling Leaderboard v2) is an enhanced version featuring improved evaluation criteria and expanded test coverage for function calling capabilities. Building on the original BFCL framework, it incorporates refined metrics for accuracy assessment, enhanced multi-turn scenarios, and additional real-world function calling challenges. The benchmark maintains focus on tool usage evaluation while providing more comprehensive assessment of LLM function calling abilities.

+
+
+
+
Evaluation Stats
Total Models5
Organizations2
Verified Results0
Self-Reported5
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

5 models
Top Score
77.3%
Average Score
71.1%
High Performers (80%+)
0

Top Organizations

#1Meta
2 models
72.2%
#2NVIDIA
3 models
70.5%
+
+
+
+
Leaderboard
5 models ranked by performance on BFCL v2
LicenseLinks
Dec 6, 2024
Llama 3.3 Community License Agreement
77.3%
Apr 7, 2025
Llama 3.1 Community License
74.1%
Mar 18, 2025
Llama 3.1 Community License
73.7%
Sep 25, 2024
Llama 3.2 Community License
67.0%
Mar 18, 2025
Llama 3.1 Community License
63.6%
+
+
+
+
Resources