HealthBench Hard
text
+
+
+
+
About
HealthBench Hard is an advanced variant of the HealthBench medical AI benchmark featuring more challenging healthcare scenarios and complex medical reasoning tasks. This rigorous evaluation tests AI models' ability to handle difficult diagnostic cases, complex clinical reasoning, and advanced medical knowledge. HealthBench Hard measures the limits of medical AI capabilities in demanding healthcare contexts.
+
+
+
+
Evaluation Stats
Total Models3
Organizations1
Verified Results0
Self-Reported3
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
3 models
Top Score
30.0%
Average Score
14.1%
High Performers (80%+)
0Top Organizations
#1OpenAI
3 models
14.1%
+
+
+
+
Leaderboard
3 models ranked by performance on HealthBench Hard
License | Links | ||||
---|---|---|---|---|---|
Aug 5, 2025 | Apache 2.0 | 30.0% | |||
Aug 5, 2025 | Apache 2.0 | 10.8% | |||
Aug 7, 2025 | Proprietary | 1.6% |