LiveBench
text
+
+
+
+
About
LiveBench is a contamination-free LLM benchmark designed to prevent test set leakage through regularly updated questions from recent sources. This benchmark provides objective evaluation across diverse tasks, ensuring models cannot memorize answers from training data. LiveBench measures genuine reasoning capabilities and knowledge application through fresh, challenging problems that evolve over time.
+
+
+
+
Evaluation Stats
Total Models13
Organizations4
Verified Results0
Self-Reported13
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
13 models
Top Score
84.6%
Average Score
63.2%
High Performers (80%+)
1Top Organizations
#1Moonshot AI
2 models
76.4%
#2OpenAI
3 models
68.0%
#3Alibaba Cloud / Qwen Team
7 models
59.6%
#4Microsoft
1 model
47.6%
+
+
+
+
Leaderboard
13 models ranked by performance on LiveBench
License | Links | ||||
---|---|---|---|---|---|
Jan 30, 2025 | Proprietary | 84.6% | |||
Apr 29, 2025 | Apache 2.0 | 77.1% | |||
Jul 11, 2025 | MIT | 76.4% | |||
Sep 5, 2025 | MIT | 76.4% | |||
Apr 29, 2025 | Apache 2.0 | 74.9% | |||
Apr 29, 2025 | Apache 2.0 | 74.3% | |||
Mar 5, 2025 | Apache 2.0 | 73.1% | |||
Dec 17, 2024 | Proprietary | 67.0% | |||
Sep 19, 2024 | Qwen | 52.3% | |||
Sep 12, 2024 | Proprietary | 52.3% |
Showing 1 to 10 of 13 models