DeepInfra

Major Platform
deepinfra.com
+
+
+
+
Platform Stats
Total Models30
Organizations8
Verified Benchmarks0
Multimodal Models9
+
+
+
+
Pricing Overview
Avg Input (per 1M)$0.27
Avg Output (per 1M)$0.63
Cheapest Model$0.01
Premium Model$1.79
+
+
+
+
Supported Features
Number of models supporting each feature
web Search
0
function Calling
30
structured Output
29
code Execution
0
batch Inference
30
finetuning
0
+
+
+
+
Input Modalities
Models supporting different input types
text
30 (100%)
image
9 (30%)
audio
0 (0%)
video
1 (3%)
+
+
+
+
Models Overview
Top performers and pricing distribution

Pricing Distribution

Input pricing per 1M tokens
$0-1
29 models
$1-5
1 models

Top Performing Models

By benchmark avg
#1Llama 3.3 70B Instruct
79.9%
#2Llama 3.1 405B Instruct
79.2%
#3Qwen2.5 72B Instruct
77.4%
#4Qwen3 235B A22B
76.2%
#5DeepSeek R1 Distill Llama 70B
76.0%

Most Affordable Models

Llama 3.2 3B Instruct
$0.01/1M
Gemma 3 4B
$0.02/1M
Gemma 3 12B
$0.05/1M

Available Models

30 models available through DeepInfra

LicenseLinks
#01GLM-4.6
GLM-4.6 is the latest version of Z.ai's flagship model, bringing significant improvements over GLM-4.5. Key features include: 200K token context window (expanded from 128K), superior coding performance with better real-world application in Claude Code/Cline/Roo Code/Kilo Code, advanced reasoning with tool use during inference, stronger agent capabilities, and refined writing aligned with human preferences. GLM-4.6 achieves competitive performance with DeepSeek-V3.2-Exp and Claude Sonnet 4, reaching near parity with Claude Sonnet 4 (48.6% win rate) on CC-Bench real-world coding tasks.
Sep 30, 2025
MIT
68.0%----
#02DeepSeekDeepSeek-V3.1
DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving.
Jan 10, 2025
MIT
66.0%68.4%-56.4%-
#03GLM-4.5
GLM-4.5 is an Agentic, Reasoning, and Coding (ARC) foundation model designed for intelligent agents, featuring 355 billion total parameters with 32 billion active parameters using MoE architecture. Trained on 23T tokens through multi-stage training, it is a hybrid reasoning model that provides two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. The model unifies agentic, reasoning, and coding capabilities with 128K context length support. It achieves exceptional performance with a score of 63.2 across 12 industry-standard benchmarks, placing 3rd among all proprietary and open-source models. Released under MIT open-source license allowing commercial use and secondary development.
Jul 28, 2025
MIT
64.2%--72.9%-
#04DeepSeekDeepSeek-R1-0528
DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach.
May 28, 2025
MIT
44.6%71.6%-73.3%-
#05DeepSeekDeepSeek-V2.5
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.
May 8, 2024
deepseek
16.8%-89.0%--
#06MetaLlama 3.2 3B Instruct
Llama 3.2 3B Instruct is a large language model that supports a context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge.
Sep 25, 2024
Llama 3.2 Community License
-----
#07GoogleGemma 3 4B
Gemma 3 4B is a 4-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.
Mar 12, 2025
Gemma
--71.3%12.6%63.2%
#08GoogleGemma 3 12B
Gemma 3 12B is a 12-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.
Mar 12, 2025
Gemma
--85.4%24.6%73.0%
#09MicrosoftPhi-4-multimodal-instruct
Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety.
Feb 1, 2025
MIT
-----
#10MetaLlama 3.2 11B Instruct
Llama 3.2 11B Vision Instruct is an instruction-tuned multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. It accepts text and images as input and generates text as output.
Sep 25, 2024
Llama 3.2 Community License
-----
Showing 1 to 10 of 30 models
+
+
+
+
Resources