Claude 3.5 Sonnet
Multimodal
Zero-eval
#1AI2D
#1BIG-Bench Hard
#1ChartQA
+6 more
by Anthropic
+
+
+
+
About
This upgraded version of Claude 3.5 Sonnet was released with significant improvements in coding and agentic tool use. Built to deliver enhanced performance in software engineering tasks, it brought substantial gains in reasoning and problem-solving while introducing the groundbreaking computer use capability in public beta, allowing it to interact with computer interfaces like a human.
+
+
+
+
Pricing Range
Input (per 1M)$3.00 -$3.00
Output (per 1M)$15.00 -$15.00
Providers3
+
+
+
+
Timeline
AnnouncedOct 22, 2024
ReleasedOct 22, 2024
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Proprietary
Performance Overview
Performance metrics and category breakdown
Overall Performance
19 benchmarks
Average Score
73.3%
Best Score
96.4%
High Performers (80%+)
9Performance Metrics
Max Context Window
400.0KAvg Throughput
81.0 tok/sAvg Latency
0ms+
+
+
+
All Benchmark Results for Claude 3.5 Sonnet
Complete list of benchmark scores with detailed information
| GSM8k | text | 0.96 | 96.4% | Self-reported | |
| DocVQA | multimodal | 0.95 | 95.2% | Self-reported | |
| AI2D | multimodal | 0.95 | 94.7% | Self-reported | |
| HumanEval | text | 0.94 | 93.7% | Self-reported | |
| BIG-Bench Hard | text | 0.93 | 93.1% | Self-reported | |
| MGSM | text | 0.92 | 91.6% | Self-reported | |
| ChartQA | multimodal | 0.91 | 90.8% | Self-reported | |
| MMLU | text | 0.90 | 90.4% | Self-reported | |
| DROP | text | 0.87 | 87.1% | Self-reported | |
| MATH | text | 0.78 | 78.3% | Self-reported |
Showing 1 to 10 of 19 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+