TruthfulQA

text
+
+
+
+
About

TruthfulQA is a benchmark measuring language models' truthfulness in generating answers to 817 questions across 38 categories including health, law, finance, and politics. Specifically designed to elicit false answers that humans might give due to misconceptions, this evaluation reveals that larger models often generate more false answers, highlighting critical challenges in AI truthfulness and factual accuracy.

+
+
+
+
Evaluation Stats
Total Models16
Organizations7
Verified Results0
Self-Reported16
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

16 models
Top Score
77.5%
Average Score
58.7%
High Performers (80%+)
0

Top Organizations

#1Microsoft
3 models
69.3%
#2IBM
3 models
59.0%
#3NVIDIA
1 model
58.6%
#4Cohere
1 model
56.3%
#5AI21 Labs
2 models
56.2%
+
+
+
+
Leaderboard
16 models ranked by performance on TruthfulQA
LicenseLinks
Aug 23, 2024
MIT
77.5%
Apr 16, 2025
Apache 2.0
66.9%
Feb 1, 2025
MIT
66.4%
Aug 23, 2024
MIT
64.0%
Oct 1, 2024
Llama 3.1 Community License
58.6%
Sep 19, 2024
Apache 2.0
58.4%
Aug 22, 2024
Jamba Open Model License
58.3%
May 2, 2025
Apache 2.0
58.1%
Sep 19, 2024
Apache 2.0
57.8%
Aug 30, 2024
CC BY-NC
56.3%
Showing 1 to 10 of 16 models
+
+
+
+
Resources