Gorilla Benchmark API Bench

text

About

Gorilla Benchmark API Bench evaluates Large Language Models' ability to accurately use APIs and tools through function calling tasks. Developed at UC Berkeley, this benchmark tests models' capability to understand API documentation, select appropriate functions, and generate correct function calls. Gorilla API Bench measures tool usage accuracy and helps assess LLMs' practical utility in real-world API integration scenarios.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

35.3%

Average Score

24.4%

High Performers (80%+)

Top Organizations

#1Meta

3 models

24.4%

Leaderboard

3 models ranked by performance on Gorilla Benchmark API Bench

			License
#01Llama 3.1 405B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	35.3%
#02Llama 3.1 70B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	29.7%
#03Llama 3.1 8B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	8.2%

Resources

Research Paper