BFCL_v3_MultiTurn

text

About

BFCL v3 MultiTurn specifically evaluates Large Language Models' ability to handle complex multi-turn function calling scenarios with sustained context awareness. This specialized benchmark tests agentic behavior through extended conversations requiring multiple function calls, context maintenance, and adaptive reasoning. It measures how well AI systems can manage state across turns while making accurate function calls in dynamic, conversational environments.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

66.9%

Average Score

66.9%

High Performers (80%+)

Top Organizations

#1NVIDIA

1 model

66.9%

Leaderboard

1 models ranked by performance on BFCL_v3_MultiTurn

			License		Links
#01Nemotron Nano 9B v2	NVIDIA	Aug 18, 2025	NVIDIA Open Model License Agreement	66.9%

Resources

Research Paper