Nexus
text
+
+
+
+
About
Nexus is a comprehensive benchmark designed to evaluate models across multiple interconnected tasks and domains. It tests systems' ability to handle complex, multi-faceted problems that require integration of different capabilities, reasoning across task boundaries, and maintaining consistency across diverse evaluation scenarios in a unified assessment framework.
+
+
+
+
Evaluation Stats
Total Models4
Organizations1
Verified Results0
Self-Reported4
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
4 models
Top Score
58.7%
Average Score
47.0%
High Performers (80%+)
0Top Organizations
#1Meta
4 models
47.0%
+
+
+
+
Leaderboard
4 models ranked by performance on Nexus
License | Links | ||||
---|---|---|---|---|---|
Jul 23, 2024 | Llama 3.1 Community License | 58.7% | |||
Jul 23, 2024 | Llama 3.1 Community License | 56.7% | |||
Jul 23, 2024 | Llama 3.1 Community License | 38.5% | |||
Sep 25, 2024 | Llama 3.2 Community License | 34.3% |