Nexus

text

About

Nexus is a comprehensive benchmark designed to evaluate models across multiple interconnected tasks and domains. It tests systems' ability to handle complex, multi-faceted problems that require integration of different capabilities, reasoning across task boundaries, and maintaining consistency across diverse evaluation scenarios in a unified assessment framework.

Evaluation Stats

Total Models4

Organizations1

Verified Results0

Self-Reported4

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

4 models

Top Score

58.7%

Average Score

47.0%

High Performers (80%+)

Top Organizations

#1Meta

4 models

47.0%

Leaderboard

4 models ranked by performance on Nexus

			License
#01Llama 3.1 405B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	58.7%
#02Llama 3.1 70B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	56.7%
#03Llama 3.1 8B Instruct	Meta	Jul 23, 2024	Llama 3.1 Community License	38.5%
#04Llama 3.2 3B Instruct	Meta	Sep 25, 2024	Llama 3.2 Community License	34.3%

Resources

Research Paper