DeepSeek

Major Contributor

About

Chinese AI company developing state-of-the-art large language models including the DeepSeek-V3 series with mixture-of-experts architecture and hybrid thinking/non-thinking capabilities

Portfolio Stats

Total Models17

Multimodal3

Benchmarks Run156

Avg Performance63.1%

Latest Release

DeepSeek-V3.2-Exp

Released: Sep 29, 2025

Release Timeline

Recent model releases by year

2025

12 models

2024

5 models

Performance Overview

Top models and benchmark performance

Top Performing Models

By avg score

#1DeepSeek R1 Zero

76.5%

#2DeepSeek R1 Distill Llama 70B

76.0%

#3DeepSeek R1 Distill Qwen 32B

74.2%

#4DeepSeek R1 Distill Qwen 14B

71.5%

#5DeepSeek-V2.5

71.1%

Benchmark Categories

Other

156

66.3%

Model Statistics

Multimodal Ratio

18%

Models with Providers

All Models

Complete portfolio of 17 models with advanced filtering

		License
#01DeepSeek-V3.2-Exp DeepSeek-V3.2-Exp is an experimental iteration introducing DeepSeek Sparse Attention (DSA) to improve long-context training and inference efficiency while keeping output quality on par with V3.1. It explores fine-grained sparse attention for extended sequence processing.	Sep 29, 2025	MIT	67.8%	74.5%	-	74.1%	-
#02DeepSeek-V3.1 DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving.	Jan 10, 2025	MIT	66.0%	68.4%	-	56.4%	-
#03DeepSeek-R1-0528 DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach.	May 28, 2025	MIT	44.6%	71.6%	-	73.3%	-
#04DeepSeek-V3 A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.	Dec 25, 2024	MIT + Model License (Commercial use allowed)	42.0%	49.6%	-	37.6%	-
#05DeepSeek-V2.5 DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.	May 8, 2024	deepseek	16.8%	-	89.0%	-	-
#06DeepSeek-V3 0324 A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.	Mar 25, 2025	MIT + Model License (Commercial use allowed)	-	-	-	49.2%	-
#07DeepSeek R1 Distill Qwen 32B DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT	-	-	-	57.2%	-
#08DeepSeek R1 Distill Qwen 7B DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT	-	-	-	37.6%	-
#09DeepSeek R1 Distill Llama 8B DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.	Jan 20, 2025	MIT	-	-	-	39.6%	-
#10DeepSeek-R1 DeepSeek-R1 is a reasoning-focused language model from DeepSeek that features advanced thinking capabilities. It serves as the foundation for DeepSeek's reasoning model family and pioneered their thinking mode approach for complex problem-solving tasks.	Jan 20, 2025	MIT	-	-	-	-	-

Showing 1 to 10 of 17 models

Resources

Official Website