NIH/Multi-needle

text
+
+
+
+
About

NIH Multi-Needle is a medical evaluation benchmark featuring multiple-needle methodology for assessing AI models' performance on healthcare-related tasks. It evaluates models' ability to identify and extract relevant medical information from complex clinical scenarios, testing diagnostic reasoning, medical knowledge application, and clinical decision-making capabilities.

+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

1 models
Top Score
84.7%
Average Score
84.7%
High Performers (80%+)
1

Top Organizations

#1Meta
1 model
84.7%
+
+
+
+
Leaderboard
1 models ranked by performance on NIH/Multi-needle
LicenseLinks
Sep 25, 2024
Llama 3.2 Community License
84.7%
+
+
+
+
Resources