DeepSeek

DeepSeek

Major Contributor
+
+
+
+
About

Chinese AI company developing state-of-the-art large language models including the DeepSeek-V3 series with mixture-of-experts architecture and hybrid thinking/non-thinking capabilities

+
+
+
+
Portfolio Stats
Total Models17
Multimodal3
Benchmarks Run156
Avg Performance63.1%
+
+
+
+
Latest Release
DeepSeek-V3.2-Exp
Released: Sep 29, 2025
+
+
+
+
Release Timeline
Recent model releases by year
2025
12 models
2024
5 models
+
+
+
+
Performance Overview
Top models and benchmark performance

Benchmark Categories

Other
156
66.3%

Model Statistics

Multimodal Ratio
18%
Models with Providers
10

All Models

Complete portfolio of 17 models with advanced filtering

LicenseLinks
#01DeepSeekDeepSeek-V3.2-Exp
DeepSeek-V3.2-Exp is an experimental iteration introducing DeepSeek Sparse Attention (DSA) to improve long-context training and inference efficiency while keeping output quality on par with V3.1. It explores fine-grained sparse attention for extended sequence processing.
Sep 29, 2025
MIT
67.8%74.5%-74.1%-
#02DeepSeekDeepSeek-V3.1
DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving.
Jan 10, 2025
MIT
66.0%68.4%-56.4%-
#03DeepSeekDeepSeek-R1-0528
DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach.
May 28, 2025
MIT
44.6%71.6%-73.3%-
#04DeepSeekDeepSeek-V3
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.
Dec 25, 2024
MIT + Model License (Commercial use allowed)
42.0%49.6%-37.6%-
#05DeepSeekDeepSeek-V2.5
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following.
May 8, 2024
deepseek
16.8%-89.0%--
#06DeepSeekDeepSeek-V3 0324
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.
Mar 25, 2025
MIT + Model License (Commercial use allowed)
---49.2%-
#07DeepSeekDeepSeek R1 Distill Qwen 1.5B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---16.9%-
#08DeepSeekDeepSeek R1 Distill Qwen 14B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---53.1%-
#09DeepSeekDeepSeek R1 Distill Qwen 32B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---57.2%-
#10DeepSeekDeepSeek R1 Distill Qwen 7B
DeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks.
Jan 20, 2025
MIT
---37.6%-
Showing 1 to 10 of 17 models
+
+
+
+
Resources