OpenAI

GPT OSS 120B

Zero-eval
#1CodeForces
#1HealthBench
#1HealthBench Hard

by OpenAI

+
+
+
+
About

GPT OSS 120B is a language model developed by OpenAI. It achieves strong performance with an average score of 61.7% across 7 benchmarks. It excels particularly in MMLU (90.0%), CodeForces (87.4%), GPQA (80.1%). It supports a 262K token context window for handling large documents. The model is available through 5 API providers. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents OpenAI's latest advancement in AI technology.

+
+
+
+
Pricing Range
Input (per 1M)$0.09 -$0.15
Output (per 1M)$0.45 -$0.60
Providers5
+
+
+
+
Timeline
AnnouncedAug 5, 2025
ReleasedAug 5, 2025
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown

Overall Performance

7 benchmarks
Average Score
61.7%
Best Score
90.0%
High Performers (80%+)
3

Performance Metrics

Max Context Window
262.1K
Avg Throughput
371.7 tok/s
Avg Latency
2ms
+
+
+
+
All Benchmark Results for GPT OSS 120B
Complete list of benchmark scores with detailed information
MMLU
text
0.90
90.0%
Self-reported
CodeForces
text
0.87
87.4%
Self-reported
GPQA
text
0.80
80.1%
Self-reported
TAU-bench Retail
text
0.68
67.8%
Self-reported
HealthBench
text
0.58
57.6%
Self-reported
HealthBench Hard
text
0.30
30.0%
Self-reported
Humanity's Last Exam
multimodal
0.19
19.0%
Self-reported