MCP-Atlas

Tool Use

About

MCP-Atlas evaluates language models on real-world tool use through the Model Context Protocol across 36 MCP servers and 220 tools, testing multi-step workflows requiring sequential tool orchestration.

Evaluation Stats

Total Models18

Organizations8

Verified Results0

Self-Reported0

Benchmark Details

Max Score100

Performance Overview

Score distribution and top performers

Score Distribution

18 models

Top Score

62.3%

Average Score

33.7%

High Performers (80%+)

Top Organizations

#1Anthropic

5 models

48.8%

#2OpenAI

5 models

40.1%

#3Zhipu AI

1 model

34.0%

#4Amazon

1 model

24.6%

#5Moonshot AI

1 model

23.9%

Leaderboard

18 models ranked by performance on MCP-Atlas

			License
#01Claude Opus 4.5	Anthropic	Nov 24, 2025	Proprietary	62.3%
#02Claude Sonnet 4.6 New	Anthropic	Feb 17, 2026	Proprietary	61.3%
#03GPT-5.2	OpenAI	Dec 11, 2025	Proprietary	60.6%
#04Gemini 3 Pro	Google DeepMind	Nov 18, 2025	Proprietary	54.1%
#05GPT-5.1	OpenAI	Nov 12, 2025	Proprietary	44.5%
#06GPT-5	OpenAI	Aug 7, 2025	Proprietary	44.5%
#07Claude Sonnet 4.5	Anthropic	Sep 29, 2025	Proprietary	43.8%
#08o3	OpenAI	Apr 16, 2025	Proprietary	43.6%
#09Claude Opus 4.1	Anthropic	Aug 5, 2025	Proprietary	40.9%
#10Claude Sonnet 4	Anthropic	May 22, 2025	Proprietary	35.6%

Showing 1 to 10 of 18 models

Additional Metrics

Extended metrics for top models on MCP-Atlas

Model	Score	Avg Coverage
Claude Opus 4.5	62.3	78.5%
GPT-5.2	60.6	80.35%
Gemini 3 Pro	54.1	73.2%
GPT-5.1	44.5	64.65%
GPT-5	44.5	61.75%
Claude Sonnet 4.5	43.8	62.17%
o3	43.6	66.91%
Claude Opus 4.1	40.9	64.99%
Claude Sonnet 4	35.6	57.35%
GLM-4.5 Air	34.0	60.59%
Nova 2 Lite	24.6	48.8%
Kimi K2	23.9	50.41%
Qwen3-235B-A22B	12.0	29.06%
Gemini 2.5 Pro	8.8	30.77%
GPT-4o	7.2	28.53%
Gemini 2.5 Flash	3.4	17.83%
Llama 4 Maverick	0.8	13.03%