TriviaQA

text

About

TriviaQA is a large-scale reading comprehension benchmark featuring over 650,000 question-answer-evidence triples with 95,000 question-answer pairs authored by trivia enthusiasts. Using distant supervision with independently gathered evidence documents, this comprehensive evaluation tests AI models' ability to find and comprehend relevant information to answer factual questions across diverse knowledge domains.

Evaluation Stats

Total Models13

Organizations4

Verified Results0

Self-Reported13

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

13 models

Top Score

85.1%

Average Score

74.3%

High Performers (80%+)

Top Organizations

#1Moonshot AI

1 model

85.1%

#2IBM

1 model

78.2%

#3Mistral AI

5 models

76.1%

#4Google

6 models

70.4%

Leaderboard

13 models ranked by performance on TriviaQA

			License
#01Kimi K2 Base	Moonshot AI	Jul 11, 2025	MIT	85.1%
#02Gemma 2 27B	Google	Jun 27, 2024	Gemma	83.7%
#03Mistral Small 3.1 24B Instruct	Mistral AI	Mar 17, 2025	Apache 2.0	80.5%
#04Mistral Small 3.1 24B Base	Mistral AI	Mar 17, 2025	Apache 2.0	80.5%
#05Mistral Small 3 24B Base	Mistral AI	Jan 30, 2025	Apache 2.0	80.3%
#06Granite 3.3 8B Base	IBM	Apr 16, 2025	Apache 2.0	78.2%
#07Gemma 2 9B	Google	Jun 27, 2024	Gemma	76.6%
#08Mistral NeMo Instruct	Mistral AI	Jul 18, 2024	Apache 2.0	73.8%
#09Gemma 3n E4B Instructed LiteRT Preview	Google	May 20, 2025	Gemma	70.2%
#10Gemma 3n E4B	Google	Jun 26, 2025	Proprietary	70.2%

Showing 1 to 10 of 13 models

Resources

Research Paper