FACTS Grounding

text
+
+
+
+
About

FACTS Grounding is a benchmark that evaluates Large Language Models' ability to generate factually accurate responses grounded in provided documents. Featuring 1,719 diverse examples across various document lengths and user requests, this benchmark uses multiple LLM judges (Gemini 1.5 Pro, GPT-4o, Claude 3.5 Sonnet) to assess factual accuracy and grounding capabilities, measuring how well models anchor their responses to source material.

+
+
+
+
Evaluation Stats
Total Models9
Organizations1
Verified Results0
Self-Reported9
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

9 models
Top Score
87.8%
Average Score
75.7%
High Performers (80%+)
5

Top Organizations

#1Google
9 models
75.7%
+
+
+
+
Leaderboard
9 models ranked by performance on FACTS Grounding
LicenseLinks
Jun 5, 2025
Proprietary
87.8%
May 20, 2025
Proprietary
85.3%
Jun 17, 2025
Creative Commons Attribution 4.0 License
84.1%
Dec 1, 2024
Proprietary
83.6%
Feb 5, 2025
Proprietary
83.6%
Mar 12, 2025
Gemma
75.8%
Mar 12, 2025
Gemma
74.9%
Mar 12, 2025
Gemma
70.1%
Mar 12, 2025
Gemma
36.4%
+
+
+
+
Resources