
o1
Zero-eval
#1GPQA Physics
#1GPQA Biology
#1GPQA Chemistry
+3 more
by OpenAI
+
+
+
+
About
o1 is a language model developed by OpenAI. It achieves strong performance with an average score of 71.6% across 19 benchmarks. It excels particularly in GSM8k (97.1%), MATH (96.4%), GPQA Physics (92.8%). It supports a 300K token context window for handling large documents. The model is available through 2 API providers. Released in 2024, it represents OpenAI's latest advancement in AI technology.
+
+
+
+
Pricing Range
Input (per 1M)$15.00 -$15.00
Output (per 1M)$60.00 -$60.00
Providers2
+
+
+
+
Timeline
AnnouncedDec 17, 2024
ReleasedDec 17, 2024
+
+
+
+
License & Family
License
Proprietary
Performance Overview
Performance metrics and category breakdown
Overall Performance
19 benchmarks
Average Score
71.6%
Best Score
97.1%
High Performers (80%+)
7Performance Metrics
Max Context Window
300.0KAvg Throughput
41.0 tok/sAvg Latency
8ms+
+
+
+
All Benchmark Results for o1
Complete list of benchmark scores with detailed information
GSM8k | text | 0.97 | 97.1% | Self-reported | |
MATH | text | 0.96 | 96.4% | Self-reported | |
GPQA Physics | text | 0.93 | 92.8% | Self-reported | |
MMLU | text | 0.92 | 91.8% | Self-reported | |
MGSM | text | 0.89 | 89.3% | Self-reported | |
HumanEval | text | 0.88 | 88.1% | Self-reported | |
MMMLU | text | 0.88 | 87.7% | Self-reported | |
GPQA | text | 0.78 | 78.0% | Self-reported | |
MMMU | multimodal | 0.78 | 77.6% | Self-reported | |
AIME 2024 | text | 0.74 | 74.3% | Self-reported |
Showing 1 to 10 of 19 benchmarks