GSM8K Chat
text
+
+
+
+
About
GSM8K Chat is a conversational variant of the Grade School Math 8K benchmark that evaluates AI models' mathematical reasoning abilities through interactive dialogue. This benchmark tests models' capability to solve math word problems while maintaining conversational context, handling follow-up questions, and providing explanations in a natural dialogue format, measuring both mathematical competency and conversational coherence.
+
+
+
+
Evaluation Stats
Total Models1
Organizations1
Verified Results0
Self-Reported1
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
1 models
Top Score
81.9%
Average Score
81.9%
High Performers (80%+)
1Top Organizations
#1NVIDIA
1 model
81.9%
+
+
+
+
Leaderboard
1 models ranked by performance on GSM8K Chat
License | Links | ||||
---|---|---|---|---|---|
Oct 1, 2024 | Llama 3.1 Community License | 81.9% |