Gorilla Benchmark API Bench
text
+
+
+
+
About
Gorilla Benchmark API Bench evaluates Large Language Models' ability to accurately use APIs and tools through function calling tasks. Developed at UC Berkeley, this benchmark tests models' capability to understand API documentation, select appropriate functions, and generate correct function calls. Gorilla API Bench measures tool usage accuracy and helps assess LLMs' practical utility in real-world API integration scenarios.
+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
35.3%
Average Score
24.4%
High Performers (80%+)
0Top Organizations
#1Meta
3 models
24.4%
+
+
+
+
Leaderboard
3 models ranked by performance on Gorilla Benchmark API Bench
License | Links | ||||
---|---|---|---|---|---|
Jul 23, 2024 | Llama 3.1 Community License | 35.3% | |||
Jul 23, 2024 | Llama 3.1 Community License | 29.7% | |||
Jul 23, 2024 | Llama 3.1 Community License | 8.2% |