API-Bank

text
+
+
+
+
About

API-Bank is a comprehensive benchmark for tool-augmented Large Language Models that evaluates their ability to find, plan, and execute API calls to accomplish user-defined goals. Featuring 264 annotated dialogues with 568 API calls, it tests LLMs across three key capabilities: deciding when to call APIs, finding relevant tools via keyword search, and employing multiple APIs for complex requests. The benchmark uses automatic and manual evaluation to assess tool-augmented reasoning.

+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

3 models
Top Score
92.0%
Average Score
88.2%
High Performers (80%+)
3

Top Organizations

#1Meta
3 models
88.2%
+
+
+
+
Leaderboard
3 models ranked by performance on API-Bank
LicenseLinks
Jul 23, 2024
Llama 3.1 Community License
92.0%
Jul 23, 2024
Llama 3.1 Community License
90.0%
Jul 23, 2024
Llama 3.1 Community License
82.6%
+
+
+
+
Resources