LiveBench

text
+
+
+
+
About

LiveBench is a contamination-free LLM benchmark designed to prevent test set leakage through regularly updated questions from recent sources. This benchmark provides objective evaluation across diverse tasks, ensuring models cannot memorize answers from training data. LiveBench measures genuine reasoning capabilities and knowledge application through fresh, challenging problems that evolve over time.

+
+
+
+
Evaluation Stats
Total Models13
Organizations4
Verified Results0
Self-Reported13
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers

Score Distribution

13 models
Top Score
84.6%
Average Score
63.2%
High Performers (80%+)
1

Top Organizations

#1Moonshot AI
2 models
76.4%
#2OpenAI
3 models
68.0%
#3Alibaba Cloud / Qwen Team
7 models
59.6%
#4Microsoft
1 model
47.6%
+
+
+
+
Leaderboard
13 models ranked by performance on LiveBench
LicenseLinks
Jan 30, 2025
Proprietary
84.6%
Apr 29, 2025
Apache 2.0
77.1%
Jul 11, 2025
MIT
76.4%
Sep 5, 2025
MIT
76.4%
Apr 29, 2025
Apache 2.0
74.9%
Apr 29, 2025
Apache 2.0
74.3%
Mar 5, 2025
Apache 2.0
73.1%
Dec 17, 2024
Proprietary
67.0%
Sep 19, 2024
Qwen
52.3%
Sep 12, 2024
Proprietary
52.3%
Showing 1 to 10 of 13 models
+
+
+
+
Resources