Novita

novita.ai
+
+
+
+
Platform Stats
Total Models20
Organizations7
Verified Benchmarks0
Multimodal Models4
+
+
+
+
Pricing Overview
Avg Input (per 1M)$0.28
Avg Output (per 1M)$1.25
Cheapest Model$0.05
Premium Model$0.70
+
+
+
+
Supported Features
Number of models supporting each feature
web Search
0
function Calling
20
structured Output
20
code Execution
0
batch Inference
20
finetuning
0
+
+
+
+
Input Modalities
Models supporting different input types
text
20 (100%)
image
4 (20%)
audio
0 (0%)
video
1 (5%)
+
+
+
+
Models Overview
Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens
$0-1
20 models

Top Performing Models

By benchmark avg
#1Kimi K2 0905
84.0%
#2Qwen3 235B A22B
76.2%
#3Qwen3 30B A3B
73.3%
#4Qwen3-235B-A22B-Instruct-2507
72.1%
#5Qwen3 32B
72.0%

Most Affordable Models

GPT OSS 20B
$0.05/1M
GPT OSS 120B
$0.10/1M
Qwen3 32B
$0.10/1M

Available Models

20 models available through Novita

LicenseLinks
#01Moonshot AIKimi K2 Instruct
Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the MuonClip optimizer, it achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. The instruct variant is post-trained for drop-in, general-purpose chat and agentic experiences without long thinking.
Jul 11, 2025
MIT
71.6%60.0%93.3%--
#02DeepSeekDeepSeek-V3.2-Exp
DeepSeek-V3.2-Exp is an experimental iteration introducing DeepSeek Sparse Attention (DSA) to improve long-context training and inference efficiency while keeping output quality on par with V3.1. It explores fine-grained sparse attention for extended sequence processing.
Sep 29, 2025
MIT
67.8%74.5%-74.1%-
#03DeepSeekDeepSeek-V3.1
DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving.
Jan 10, 2025
MIT
66.0%68.4%-56.4%-
#04GLM-4.5
GLM-4.5 is an Agentic, Reasoning, and Coding (ARC) foundation model designed for intelligent agents, featuring 355 billion total parameters with 32 billion active parameters using MoE architecture. Trained on 23T tokens through multi-stage training, it is a hybrid reasoning model that provides two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. The model unifies agentic, reasoning, and coding capabilities with 128K context length support. It achieves exceptional performance with a score of 63.2 across 12 industry-standard benchmarks, placing 3rd among all proprietary and open-source models. Released under MIT open-source license allowing commercial use and secondary development.
Jul 28, 2025
MIT
64.2%--72.9%-
#05DeepSeekDeepSeek-R1-0528
DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach.
May 28, 2025
MIT
44.6%71.6%-73.3%-
#06OpenAIGPT OSS 20B
The gpt-oss-20b model (technically 20.9B parameters) achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Note: While referred to as '20b' for simplicity, it technically has 20.9B parameters.
Aug 5, 2025
Apache 2.0
-----
#07OpenAIGPT OSS 120B
GPT-OSS-120B is an open-weight, 116.8B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. It achieves near-parity with OpenAI o4-mini on core reasoning benchmarks. Note: While referred to as '120b' for simplicity, it technically has 116.8B parameters.
Aug 5, 2025
Apache 2.0
-----
#08Alibaba Cloud / Qwen TeamQwen3 32B
Qwen3-32B is a large language model from Alibaba's Qwen3 series. It features 32.8 billion parameters, a 128k token context window, support for 119 languages, and hybrid thinking modes allowing switching between deep reasoning and fast responses. It demonstrates strong performance in reasoning, instruction-following, and agent capabilities.
Apr 29, 2025
Apache 2.0
---65.7%-
#09Alibaba Cloud / Qwen TeamQwen3 30B A3B
Qwen3-30B-A3B is a smaller Mixture-of-Experts (MoE) model from the Qwen3 series by Alibaba, with 30.5 billion total parameters and 3.3 billion activated parameters. Features hybrid thinking/non-thinking modes, support for 119 languages, and enhanced agent capabilities. It aims to outperform previous models like QwQ-32B while using significantly fewer activated parameters.
Apr 29, 2025
Apache 2.0
---62.6%-
#10MetaLlama 4 Scout
Llama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 10 million token context window.
Apr 5, 2025
Llama 4 Community License Agreement
---32.8%67.8%
Showing 1 to 10 of 20 models
+
+
+
+
Resources