ComplexFuncBench

text

About

ComplexFuncBench is an advanced function calling benchmark that evaluates Large Language Models' ability to handle sophisticated, multi-step function calling scenarios with complex dependencies. This benchmark tests AI systems' capacity to understand intricate function signatures, manage complex data flows, and execute multi-layered function calls accurately. ComplexFuncBench measures advanced tool usage capabilities beyond simple function calling scenarios.

Evaluation Stats

Total Models6

Organizations1

Verified Results0

Self-Reported6

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

6 models

Top Score

66.5%

Average Score

44.6%

High Performers (80%+)

Top Organizations

#1OpenAI

6 models

44.6%

Leaderboard

6 models ranked by performance on ComplexFuncBench

			License
#01GPT-4o	OpenAI	Aug 6, 2024	Proprietary	66.5%
#02GPT-4.1	OpenAI	Apr 14, 2025	Proprietary	65.5%
#03GPT-4.5	OpenAI	Feb 27, 2025	Proprietary	63.0%
#04GPT-4.1 mini	OpenAI	Apr 14, 2025	Proprietary	49.3%
#05o3-mini	OpenAI	Jan 30, 2025	Proprietary	17.6%
#06GPT-4.1 nano	OpenAI	Apr 14, 2025	Proprietary	5.7%

Resources

Research Paper