All Benchmarks
Explore all 20 benchmarks for evaluating language models across different capabilities and domains
| Properties | Links | ||||||
|---|---|---|---|---|---|---|---|
Coding | 21 | 9 | 80.9% | 76.4% | |||
Agents | 19 | 5 | 74.7% | 8.3% | |||
Tool Use | 18 | 8 | 62.3% | 33.7% | |||
Coding | 18 | 9 | 51.7% | 43.7% | |||
Multimodal | 16 | 9 | 78.4% | 30.1% | |||
Reasoning | 15 | 4 | 37.5% | 21.7% | |||
Coding | 12 | 4 | 69.9% | 56.3% | |||
Finance | 11 | 5 | 63.3% | 56.1% | |||
Agents | 11 | 4 | 49.7% | 33.3% | |||
Agents | 10 | 6 | 85.4% | 69.3% |
Showing 1 to 10 of 20 benchmarks