API-Bank
text
+
+
+
+
About
API-Bank is a comprehensive benchmark for tool-augmented Large Language Models that evaluates their ability to find, plan, and execute API calls to accomplish user-defined goals. Featuring 264 annotated dialogues with 568 API calls, it tests LLMs across three key capabilities: deciding when to call APIs, finding relevant tools via keyword search, and employing multiple APIs for complex requests. The benchmark uses automatic and manual evaluation to assess tool-augmented reasoning.
+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
92.0%
Average Score
88.2%
High Performers (80%+)
3Top Organizations
#1Meta
3 models
88.2%
+
+
+
+
Leaderboard
3 models ranked by performance on API-Bank
License | Links | ||||
---|---|---|---|---|---|
Jul 23, 2024 | Llama 3.1 Community License | 92.0% | |||
Jul 23, 2024 | Llama 3.1 Community License | 90.0% | |||
Jul 23, 2024 | Llama 3.1 Community License | 82.6% |