BFCL_v3_MultiTurn

text
+
+
+
+
About

BFCL v3 MultiTurn specifically evaluates Large Language Models' ability to handle complex multi-turn function calling scenarios with sustained context awareness. This specialized benchmark tests agentic behavior through extended conversations requiring multiple function calls, context maintenance, and adaptive reasoning. It measures how well AI systems can manage state across turns while making accurate function calls in dynamic, conversational environments.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
66.9%
Average Score
66.9%
High Performers (80%+)
0

Top Organizations

#1NVIDIA
1 model
66.9%
+
+
+
+
Leaderboard
1 models ranked by performance on BFCL_v3_MultiTurn
LicenseLinks
Aug 18, 2025
NVIDIA Open Model License Agreement
66.9%
+
+
+
+
Resources