FACTS Grounding

text

About

FACTS Grounding is a benchmark that evaluates Large Language Models' ability to generate factually accurate responses grounded in provided documents. Featuring 1,719 diverse examples across various document lengths and user requests, this benchmark uses multiple LLM judges (Gemini 1.5 Pro, GPT-4o, Claude 3.5 Sonnet) to assess factual accuracy and grounding capabilities, measuring how well models anchor their responses to source material.

Evaluation Stats

Total Models9

Organizations1

Verified Results0

Self-Reported9

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

9 models

Top Score

87.8%

Average Score

75.7%

High Performers (80%+)

Top Organizations

#1Google

9 models

75.7%

Leaderboard

9 models ranked by performance on FACTS Grounding

			License
#01Gemini 2.5 Pro Preview 06-05	Google	Jun 5, 2025	Proprietary	87.8%
#02Gemini 2.5 Flash	Google	May 20, 2025	Proprietary	85.3%
#03Gemini 2.5 Flash-Lite	Google	Jun 17, 2025	Creative Commons Attribution 4.0 License	84.1%
#04Gemini 2.0 Flash	Google	Dec 1, 2024	Proprietary	83.6%
#05Gemini 2.0 Flash-Lite	Google	Feb 5, 2025	Proprietary	83.6%
#06Gemma 3 12B	Google	Mar 12, 2025	Gemma	75.8%
#07Gemma 3 27B	Google	Mar 12, 2025	Gemma	74.9%
#08Gemma 3 4B	Google	Mar 12, 2025	Gemma	70.1%
#09Gemma 3 1B	Google	Mar 12, 2025	Gemma	36.4%

Resources

Research Paper