TheoremQA
text
+
+
+
+
About
TheoremQA is the first theorem-driven question answering benchmark featuring 800 high-quality questions covering 350 theorems across Mathematics, Physics, Electrical Engineering, Computer Science, and Finance. Curated by domain experts, this rigorous evaluation tests AI models' ability to apply theoretical knowledge and mathematical theorems to solve challenging science problems requiring deep understanding and reasoning.
+
+
+
+
Evaluation Stats
Total Models6
Organizations1
Verified Results0
Self-Reported6
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
6 models
Top Score
44.4%
Average Score
39.0%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
6 models
39.0%
+
+
+
+
Leaderboard
6 models ranked by performance on TheoremQA
License | Links | ||||
---|---|---|---|---|---|
Jul 23, 2024 | tongyi-qianwen | 44.4% | |||
Sep 19, 2024 | Apache 2.0 | 44.1% | |||
Sep 19, 2024 | Apache 2.0 | 43.1% | |||
Sep 19, 2024 | Apache 2.0 | 43.0% | |||
Sep 19, 2024 | Apache 2.0 | 34.0% | |||
Jul 23, 2024 | Apache 2.0 | 25.3% |