- Home
- /
- Benchmarks
- /
- MCP-Atlas
MCP-Atlas
Tool Use
+
+
+
+
About
MCP-Atlas evaluates language models on real-world tool use through the Model Context Protocol across 36 MCP servers and 220 tools, testing multi-step workflows requiring sequential tool orchestration.
+
+
+
+
Evaluation Stats
Total Models18
Organizations8
Verified Results0
Self-Reported0
+
+
+
+
Benchmark Details
Max Score100
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
18 models
Top Score
62.3%
Average Score
33.7%
High Performers (80%+)
0Top Organizations
#1Anthropic
5 models
48.8%
#2OpenAI
5 models
40.1%
#3Zhipu AI
1 model
34.0%
#4Amazon
1 model
24.6%
#5Moonshot AI
1 model
23.9%
+
+
+
+
Leaderboard
18 models ranked by performance on MCP-Atlas
| License | Links | ||||
|---|---|---|---|---|---|
| Nov 24, 2025 | Proprietary | 62.3% | |||
| Feb 17, 2026 | Proprietary | 61.3% | |||
| Dec 11, 2025 | Proprietary | 60.6% | |||
| Nov 18, 2025 | Proprietary | 54.1% | |||
| Nov 12, 2025 | Proprietary | 44.5% | |||
| Aug 7, 2025 | Proprietary | 44.5% | |||
| Sep 29, 2025 | Proprietary | 43.8% | |||
| Apr 16, 2025 | Proprietary | 43.6% | |||
| Aug 5, 2025 | Proprietary | 40.9% | |||
| May 22, 2025 | Proprietary | 35.6% |
Showing 1 to 10 of 18 models
+
+
+
+
Additional Metrics
Extended metrics for top models on MCP-Atlas
| Model | Score | Avg Coverage |
|---|---|---|
| Claude Opus 4.5 | 62.3 | 78.5% |
| GPT-5.2 | 60.6 | 80.35% |
| Gemini 3 Pro | 54.1 | 73.2% |
| GPT-5.1 | 44.5 | 64.65% |
| GPT-5 | 44.5 | 61.75% |
| Claude Sonnet 4.5 | 43.8 | 62.17% |
| o3 | 43.6 | 66.91% |
| Claude Opus 4.1 | 40.9 | 64.99% |
| Claude Sonnet 4 | 35.6 | 57.35% |
| GLM-4.5 Air | 34.0 | 60.59% |
| Nova 2 Lite | 24.6 | 48.8% |
| Kimi K2 | 23.9 | 50.41% |
| Qwen3-235B-A22B | 12.0 | 29.06% |
| Gemini 2.5 Pro | 8.8 | 30.77% |
| GPT-4o | 7.2 | 28.53% |
| Gemini 2.5 Flash | 3.4 | 17.83% |
| Llama 4 Maverick | 0.8 | 13.03% |