FACTS Grounding
text
+
+
+
+
About
FACTS Grounding is a benchmark that evaluates Large Language Models' ability to generate factually accurate responses grounded in provided documents. Featuring 1,719 diverse examples across various document lengths and user requests, this benchmark uses multiple LLM judges (Gemini 1.5 Pro, GPT-4o, Claude 3.5 Sonnet) to assess factual accuracy and grounding capabilities, measuring how well models anchor their responses to source material.
+
+
+
+
Evaluation Stats
Total Models9
Organizations1
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
9 models
Top Score
87.8%
Average Score
75.7%
High Performers (80%+)
5Top Organizations
#1Google
9 models
75.7%
+
+
+
+
Leaderboard
9 models ranked by performance on FACTS Grounding
License | Links | ||||
---|---|---|---|---|---|
Jun 5, 2025 | Proprietary | 87.8% | |||
May 20, 2025 | Proprietary | 85.3% | |||
Jun 17, 2025 | Creative Commons Attribution 4.0 License | 84.1% | |||
Dec 1, 2024 | Proprietary | 83.6% | |||
Feb 5, 2025 | Proprietary | 83.6% | |||
Mar 12, 2025 | Gemma | 75.8% | |||
Mar 12, 2025 | Gemma | 74.9% | |||
Mar 12, 2025 | Gemma | 70.1% | |||
Mar 12, 2025 | Gemma | 36.4% |