Claude 3 Opus
Multimodal
Zero-eval
#1HellaSwag
#2ARC-C
by Anthropic
+
+
+
+
About
Claude 3 Opus was developed as the most capable model in the Claude 3 family, designed to set new industry benchmarks across a wide range of cognitive tasks. Built to handle complex analysis and extended tasks requiring deep reasoning, it balanced frontier intelligence with careful safety considerations, representing the flagship tier of the Claude 3 generation.
+
+
+
+
Pricing Range
Input (per 1M)$15.00 -$15.00
Output (per 1M)$75.00 -$75.00
Providers3
+
+
+
+
Timeline
AnnouncedFeb 29, 2024
ReleasedFeb 29, 2024
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Proprietary
Performance Overview
Performance metrics and category breakdown
Overall Performance
11 benchmarks
Average Score
81.6%
Best Score
96.4%
High Performers (80%+)
8Performance Metrics
Max Context Window
400.0KAvg Throughput
87.3 tok/sAvg Latency
0ms+
+
+
+
All Benchmark Results for Claude 3 Opus
Complete list of benchmark scores with detailed information
| ARC-C | text | 0.96 | 96.4% | Self-reported | |
| HellaSwag | text | 0.95 | 95.4% | Self-reported | |
| GSM8k | text | 0.95 | 95.0% | Self-reported | |
| MGSM | text | 0.91 | 90.7% | Self-reported | |
| MMLU | text | 0.87 | 86.8% | Self-reported | |
| BIG-Bench Hard | text | 0.87 | 86.8% | Self-reported | |
| HumanEval | text | 0.85 | 84.9% | Self-reported | |
| DROP | text | 0.83 | 83.1% | Self-reported | |
| MMLU-Pro | text | 0.69 | 68.5% | Self-reported | |
| MATH | text | 0.60 | 60.1% | Self-reported |
Showing 1 to 10 of 11 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+