Novita

novita.ai

Platform Stats

Total Models20

Organizations7

Verified Benchmarks0

Multimodal Models4

Pricing Overview

Avg Input (per 1M)$0.28

Avg Output (per 1M)$1.25

Cheapest Model$0.05

Premium Model$0.70

Supported Features

Number of models supporting each feature

web Search

function Calling

structured Output

code Execution

batch Inference

finetuning

Input Modalities

Models supporting different input types

text

20 (100%)

image

4 (20%)

audio

0 (0%)

video

1 (5%)

Models Overview

Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens

$0-1

20 models

Top Performing Models

By benchmark avg

#1Kimi K2 0905

84.0%

#2Qwen3 235B A22B

76.2%

#3Qwen3 30B A3B

73.3%

#4Qwen3-235B-A22B-Instruct-2507

72.1%

#5Qwen3 32B

72.0%

Most Affordable Models

GPT OSS 20B

$0.05/1M

GPT OSS 120B

$0.10/1M

Qwen3 32B

$0.10/1M

Available Models

20 models available through Novita

			License
#01Kimi K2 Instruct Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the MuonClip optimizer, it achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. The instruct variant is post-trained for drop-in, general-purpose chat and agentic experiences without long thinking.	Moonshot AI	Jul 11, 2025	MIT	71.6%	60.0%	93.3%	-	-
#02DeepSeek-V3.2-Exp DeepSeek-V3.2-Exp is an experimental iteration introducing DeepSeek Sparse Attention (DSA) to improve long-context training and inference efficiency while keeping output quality on par with V3.1. It explores fine-grained sparse attention for extended sequence processing.	DeepSeek	Sep 29, 2025	MIT	67.8%	74.5%	-	74.1%	-
#03DeepSeek-V3.1 DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving.	DeepSeek	Jan 10, 2025	MIT	66.0%	68.4%	-	56.4%	-
#04GLM-4.5 GLM-4.5 is an Agentic, Reasoning, and Coding (ARC) foundation model designed for intelligent agents, featuring 355 billion total parameters with 32 billion active parameters using MoE architecture. Trained on 23T tokens through multi-stage training, it is a hybrid reasoning model that provides two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. The model unifies agentic, reasoning, and coding capabilities with 128K context length support. It achieves exceptional performance with a score of 63.2 across 12 industry-standard benchmarks, placing 3rd among all proprietary and open-source models. Released under MIT open-source license allowing commercial use and secondary development.	Zhipu AI	Jul 28, 2025	MIT	64.2%	-	-	72.9%	-
#05DeepSeek-R1-0528 DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach.	DeepSeek	May 28, 2025	MIT	44.6%	71.6%	-	73.3%	-
#06GPT OSS 20B The gpt-oss-20b model (technically 20.9B parameters) achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Note: While referred to as '20b' for simplicity, it technically has 20.9B parameters.	OpenAI	Aug 5, 2025	Apache 2.0	-	-	-	-	-
#07GPT OSS 120B GPT-OSS-120B is an open-weight, 116.8B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation. It achieves near-parity with OpenAI o4-mini on core reasoning benchmarks. Note: While referred to as '120b' for simplicity, it technically has 116.8B parameters.	OpenAI	Aug 5, 2025	Apache 2.0	-	-	-	-	-
#08Qwen3 32B Qwen3-32B is a large language model from Alibaba's Qwen3 series. It features 32.8 billion parameters, a 128k token context window, support for 119 languages, and hybrid thinking modes allowing switching between deep reasoning and fast responses. It demonstrates strong performance in reasoning, instruction-following, and agent capabilities.	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	-	-	-	65.7%	-
#09Qwen3 30B A3B Qwen3-30B-A3B is a smaller Mixture-of-Experts (MoE) model from the Qwen3 series by Alibaba, with 30.5 billion total parameters and 3.3 billion activated parameters. Features hybrid thinking/non-thinking modes, support for 119 languages, and enhanced agent capabilities. It aims to outperform previous models like QwQ-32B while using significantly fewer activated parameters.	Alibaba Cloud / Qwen Team	Apr 29, 2025	Apache 2.0	-	-	-	62.6%	-
#10Llama 4 Scout Llama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 10 million token context window.	Meta	Apr 5, 2025	Llama 4 Community License Agreement	-	-	-	32.8%	67.8%

Showing 1 to 10 of 20 models

Resources

Official Website