API-Bank

text

About

API-Bank is a comprehensive benchmark for tool-augmented Large Language Models that evaluates their ability to find, plan, and execute API calls to accomplish user-defined goals. Featuring 264 annotated dialogues with 568 API calls, it tests LLMs across three key capabilities: deciding when to call APIs, finding relevant tools via keyword search, and employing multiple APIs for complex requests. The benchmark uses automatic and manual evaluation to assess tool-augmented reasoning.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

92.0%

Average Score

88.2%

High Performers (80%+)

Top Organizations

#1Meta

3 models

88.2%

Leaderboard

3 models ranked by performance on API-Bank

			License
#01Llama 3.1 405B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	92.0%
#02Llama 3.1 70B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	90.0%
#03Llama 3.1 8B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	82.6%

Resources

Research Paper