BFCL_v3_MultiTurn
text
+
+
+
+
About
BFCL v3 MultiTurn specifically evaluates Large Language Models' ability to handle complex multi-turn function calling scenarios with sustained context awareness. This specialized benchmark tests agentic behavior through extended conversations requiring multiple function calls, context maintenance, and adaptive reasoning. It measures how well AI systems can manage state across turns while making accurate function calls in dynamic, conversational environments.
+
+
+
+
Evaluation Stats
Total Models0
Organizations0
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
No evaluation results available for this benchmark