o1
Zero-eval
#1GPQA Physics
#1GPQA Biology
#1GPQA Chemistry
+3 more
by OpenAI
+
+
+
+
About
o1 was developed as part of OpenAI's reasoning-focused model series, designed to spend more time thinking before responding. Built to excel at complex reasoning tasks in science, coding, and mathematics, it employs extended internal reasoning processes to solve harder problems than traditional language models through careful step-by-step analysis.
+
+
+
+
Pricing Range
Input (per 1M)$15.00 -$15.00
Output (per 1M)$60.00 -$60.00
Providers2
+
+
+
+
Timeline
AnnouncedDec 17, 2024
ReleasedDec 17, 2024
+
+
+
+
License & Family
License
Proprietary
Performance Overview
Performance metrics and category breakdown
Overall Performance
19 benchmarks
Average Score
71.6%
Best Score
97.1%
High Performers (80%+)
7Performance Metrics
Max Context Window
300.0KAvg Throughput
41.0 tok/sAvg Latency
8ms+
+
+
+
All Benchmark Results for o1
Complete list of benchmark scores with detailed information
| GSM8k | text | 0.97 | 97.1% | Self-reported | |
| MATH | text | 0.96 | 96.4% | Self-reported | |
| GPQA Physics | text | 0.93 | 92.8% | Self-reported | |
| MMLU | text | 0.92 | 91.8% | Self-reported | |
| MGSM | text | 0.89 | 89.3% | Self-reported | |
| HumanEval | text | 0.88 | 88.1% | Self-reported | |
| MMMLU | text | 0.88 | 87.7% | Self-reported | |
| GPQA | text | 0.78 | 78.0% | Self-reported | |
| MMMU | multimodal | 0.78 | 77.6% | Self-reported | |
| AIME 2024 | text | 0.74 | 74.3% | Self-reported |
Showing 1 to 10 of 19 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+