MCP-Atlas

Tool Use
+
+
+
+
About

MCP-Atlas evaluates language models on real-world tool use through the Model Context Protocol across 36 MCP servers and 220 tools, testing multi-step workflows requiring sequential tool orchestration.

+
+
+
+
Evaluation Stats
Total Models18
Organizations8
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

18 models
Top Score
62.3%
Average Score
33.7%
High Performers (80%+)
0

Top Organizations

#1Anthropic
5 models
48.8%
#2OpenAI
5 models
40.1%
#3Zhipu AI
1 model
34.0%
#4Amazon
1 model
24.6%
#5Moonshot AI
1 model
23.9%
+
+
+
+
Leaderboard
18 models ranked by performance on MCP-Atlas
LicenseLinks
Nov 24, 2025
Proprietary
62.3%
Feb 17, 2026
Proprietary
61.3%
Dec 11, 2025
Proprietary
60.6%
Nov 18, 2025
Proprietary
54.1%
Nov 12, 2025
Proprietary
44.5%
Aug 7, 2025
Proprietary
44.5%
Sep 29, 2025
Proprietary
43.8%
Apr 16, 2025
Proprietary
43.6%
Aug 5, 2025
Proprietary
40.9%
May 22, 2025
Proprietary
35.6%
Showing 1 to 10 of 18 models
+
+
+
+
Additional Metrics
Extended metrics for top models on MCP-Atlas
ModelScoreAvg Coverage
Claude Opus 4.562.378.5%
GPT-5.260.680.35%
Gemini 3 Pro54.173.2%
GPT-5.144.564.65%
GPT-544.561.75%
Claude Sonnet 4.543.862.17%
o343.666.91%
Claude Opus 4.140.964.99%
Claude Sonnet 435.657.35%
GLM-4.5 Air34.060.59%
Nova 2 Lite24.648.8%
Kimi K223.950.41%
Qwen3-235B-A22B12.029.06%
Gemini 2.5 Pro8.830.77%
GPT-4o7.228.53%
Gemini 2.5 Flash3.417.83%
Llama 4 Maverick0.813.03%