NIH/Multi-needle
text
+
+
+
+
About
NIH Multi-Needle is a medical evaluation benchmark featuring multiple-needle methodology for assessing AI models' performance on healthcare-related tasks. It evaluates models' ability to identify and extract relevant medical information from complex clinical scenarios, testing diagnostic reasoning, medical knowledge application, and clinical decision-making capabilities.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
84.7%
Average Score
84.7%
High Performers (80%+)
1Top Organizations
#1Meta
1 model
84.7%
+
+
+
+
Leaderboard
1 models ranked by performance on NIH/Multi-needle
License | Links | ||||
---|---|---|---|---|---|
Sep 25, 2024 | Llama 3.2 Community License | 84.7% |