#01 DeepSeek-V3.2-ExpDeepSeek-V3.2-Exp is an experimental iteration introducing DeepSeek Sparse Attention (DSA) to improve long-context training and inference efficiency while keeping output quality on par with V3.1. It explores fine-grained sparse attention for extended sequence processing. | Sep 29, 2025 | | 67.8% | 74.5% | - | 74.1% | - | |
#02 DeepSeek-V3.1DeepSeek-V3.1 is a hybrid model supporting both thinking and non-thinking modes through different chat templates. Built on DeepSeek-V3.1-Base with a two-phase long context extension (32K phase: 630B tokens, 128K phase: 209B tokens), it features 671B total parameters with 37B activated. Key improvements include smarter tool calling through post-training optimization, higher thinking efficiency achieving comparable quality to DeepSeek-R1-0528 while responding more quickly, and UE8M0 FP8 scale data format for model weights and activations. The model excels in both reasoning tasks (thinking mode) and practical applications (non-thinking mode), with particularly strong performance in code agent tasks, math competitions, and search-based problem solving. | Jan 10, 2025 | | 66.0% | 68.4% | - | 56.4% | - | |
#03 DeepSeek-R1-0528DeepSeek-R1-0528 is the May 28, 2025 version of DeepSeek's reasoning model. It features advanced thinking capabilities and serves as a benchmark comparison for newer models like DeepSeek-V3.1. This model excels in complex reasoning tasks, mathematical problem-solving, and code generation through its thinking mode approach. | May 28, 2025 | | 44.6% | 71.6% | - | 73.3% | - | |
#04 DeepSeek-V3A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks. | Dec 25, 2024 | MIT + Model License (Commercial use allowed) | 42.0% | 49.6% | - | 37.6% | - | |
#05 DeepSeek-V2.5DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, integrating general and coding abilities. It better aligns with human preferences and has been optimized in various aspects, including writing and instruction following. | May 8, 2024 | | 16.8% | - | 89.0% | - | - | |
#06 DeepSeek-V3 0324A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks. | Mar 25, 2025 | MIT + Model License (Commercial use allowed) | - | - | - | 49.2% | - | |
#07 DeepSeek R1 Distill Qwen 1.5BDeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks. | Jan 20, 2025 | | - | - | - | 16.9% | - | |
#08 DeepSeek R1 Distill Qwen 14BDeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks. | Jan 20, 2025 | | - | - | - | 53.1% | - | |
#09 DeepSeek R1 Distill Qwen 32BDeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks. | Jan 20, 2025 | | - | - | - | 57.2% | - | |
#10 DeepSeek R1 Distill Qwen 7BDeepSeek-R1 is the first-generation reasoning model built atop DeepSeek-V3 (671B total parameters, 37B activated per token). It incorporates large-scale reinforcement learning (RL) to enhance its chain-of-thought and reasoning capabilities, delivering strong performance in math, code, and multi-step reasoning tasks. | Jan 20, 2025 | | - | - | - | 37.6% | - | |