DeepInfra

Major Platform

deepinfra.com

Platform Stats

Total Models30

Organizations8

Verified Benchmarks0

Multimodal Models9

Pricing Overview

Avg Input (per 1M)$0.27

Avg Output (per 1M)$0.63

Cheapest Model$0.01

Premium Model$1.79

Supported Features

Number of models supporting each feature

web Search

function Calling

structured Output

code Execution

batch Inference

finetuning

Input Modalities

Models supporting different input types

text

30 (100%)

image

9 (30%)

audio

0 (0%)

video

1 (3%)

Models Overview

Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens

$0-1

29 models

$1-5

1 models

Top Performing Models

By benchmark avg

#1Llama 3.3 70B Instruct

79.9%

#2Llama 3.1 405B Instruct

79.2%

#3Qwen2.5 72B Instruct

77.4%

#4Qwen3 235B A22B

76.2%

#5DeepSeek R1 Distill Llama 70B

76.0%

Most Affordable Models

Llama 3.2 3B Instruct

$0.01/1M

Gemma 3 4B

$0.02/1M

Gemma 3 12B

$0.05/1M

Available Models

30 models available through DeepInfra

			License
#01GLM-4.6 GLM-4.6 is the latest version of Z.ai's flagship model, bringing significant improvements over GLM-4.5. Key features include: 200K token context window (expanded from 128K), superior coding performance with better real-world application in Claude Code/Cline/Roo Code/Kilo Code, advanced reasoning with tool use during inference, stronger agent capabilities, and refined writing aligned with human preferences. GLM-4.6 achieves competitive performance with DeepSeek-V3.2-Exp and Claude Sonnet 4, reaching near parity with Claude Sonnet 4 (48.6% win rate) on CC-Bench real-world coding tasks.	Zhipu AI	Sep 30, 2025	MIT	68.0%	-	-	-	-
#02DeepSeek-V3.1 DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving.	DeepSeek	Jan 10, 2025	MIT	66.0%	68.4%	-	56.4%	-
#03GLM-4.5 GLM-4.5 is an Agentic, Reasoning, and Coding (ARC) foundation model designed for intelligent agents, featuring 355 billion total parameters with 32 billion active parameters using MoE architecture. Trained on 23T tokens through multi-stage training, it is a hybrid reasoning model that provides two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. The model unifies agentic, reasoning, and coding capabilities with 128K context length support. It achieves exceptional performance with a score of 63.2 across 12 industry-standard benchmarks, placing 3rd among all proprietary and open-source models. Released under MIT open-source license allowing commercial use and secondary development.	Zhipu AI	Jul 28, 2025	MIT	64.2%	-	-	72.9%	-
#04DeepSeek-R1-0528 DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach.	DeepSeek	May 28, 2025	MIT	44.6%	71.6%	-	73.3%	-
#05DeepSeek-V2.5 DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.	DeepSeek	May 8, 2024	deepseek	16.8%	-	89.0%	-	-
#06Llama 3.2 3B Instruct Llama 3.2 3B Instruct is a large language model that supports a context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge.	Meta	Sep 25, 2024	Llama 3.2 Community License	-	-	-	-	-
#07Gemma 3 4B Gemma 3 4B is a 4-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.	Google	Mar 12, 2025	Gemma	-	-	71.3%	12.6%	63.2%
#08Gemma 3 12B Gemma 3 12B is a 12-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.	Google	Mar 12, 2025	Gemma	-	-	85.4%	24.6%	73.0%
#09Phi-4-multimodal-instruct Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety.	Microsoft	Feb 1, 2025	MIT	-	-	-	-	-
#10Llama 3.2 11B Instruct Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output.	Meta	Sep 25, 2024	Llama 3.2 Community License	-	-	-	-	-

Showing 1 to 10 of 30 models

Resources

Official Website