Anthropic

Claude 3.5 Sonnet

Multimodal
Zero-eval
#1AI2D
#1BIG-Bench Hard
#1ChartQA
+6 more

by Anthropic

+
+
+
+
About

This upgraded version of Claude 3.5 Sonnet was released with significant improvements in coding and agentic tool use. Built to deliver enhanced performance in software engineering tasks, it brought substantial gains in reasoning and problem-solving while introducing the groundbreaking computer use capability in public beta, allowing it to interact with computer interfaces like a human.

+
+
+
+
Pricing Range
Input (per 1M)$3.00 -$3.00
Output (per 1M)$15.00 -$15.00
Providers3
+
+
+
+
Timeline
AnnouncedOct 22, 2024
ReleasedOct 22, 2024
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Proprietary
Performance Overview
Performance metrics and category breakdown

Overall Performance

19 benchmarks
Average Score
73.3%
Best Score
96.4%
High Performers (80%+)
9

Performance Metrics

Max Context Window
400.0K
Avg Throughput
81.0 tok/s
Avg Latency
0ms
+
+
+
+
All Benchmark Results for Claude 3.5 Sonnet
Complete list of benchmark scores with detailed information
GSM8k
text
0.96
96.4%
Self-reported
DocVQA
multimodal
0.95
95.2%
Self-reported
AI2D
multimodal
0.95
94.7%
Self-reported
HumanEval
text
0.94
93.7%
Self-reported
BIG-Bench Hard
text
0.93
93.1%
Self-reported
MGSM
text
0.92
91.6%
Self-reported
ChartQA
multimodal
0.91
90.8%
Self-reported
MMLU
text
0.90
90.4%
Self-reported
DROP
text
0.87
87.1%
Self-reported
MATH
text
0.78
78.3%
Self-reported
Showing 1 to 10 of 19 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+