Alibaba Cloud / Qwen Team

Major Contributor

qwenlm.github.ioCN

About

The Qwen Team from Alibaba Cloud, developing the Qwen series of large language models including state-of-the-art mixture-of-experts and thinking-enabled models

Portfolio Stats

Total Models24

Multimodal6

Benchmarks Run429

Avg Performance65.0%

Latest Release

Qwen3-Next-80B-A3B-Instruct

Released: Sep 10, 2025

Release Timeline

Recent model releases by year

2025

13 models

2024

11 models

Performance Overview

Top models and benchmark performance

Top Performing Models

By avg score

#1Qwen2.5 72B Instruct

77.4%

#2Qwen3 235B A22B

76.2%

#3Qwen2-VL-72B-Instruct

75.8%

#4QwQ-32B

74.6%

#5Qwen2.5 32B Instruct

74.3%

Benchmark Categories

Other

429

67.5%

Model Statistics

Multimodal Ratio

25%

Models with Providers

All Models

Complete portfolio of 24 models with advanced filtering

		License
#01Qwen3-Next-80B-A3B-Instruct Qwen3-Next-80B-A3B-Instruct is the first in the Qwen3-Next series, featuring groundbreaking architectural innovations. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared) achieving extreme low activation ratio, and Multi-Token Prediction for improved performance and faster inference. With 80B total parameters and only 3B activated, it outperforms Qwen3-32B-Base with 10% training cost and 10x throughput for 32K+ contexts. The model performs on par with Qwen3-235B-A22B-Instruct-2507 while excelling at ultra-long-context tasks up to 256K tokens (extensible to 1M with YaRN). Architecture: 48 layers, 15T training tokens, hybrid layout of 12(3(Gated DeltaNet->MoE)->(Gated Attention->MoE)).	Sep 10, 2025	Apache 2.0	-	49.8%	-	-	-
#02Qwen3-Next-80B-A3B-Base Qwen3-Next-80B-A3B-Base is the foundation model in the Qwen3-Next series, featuring revolutionary architectural innovations for ultimate training and inference efficiency. It introduces Hybrid Attention combining Gated DeltaNet (75% layers) and Gated Attention (25% layers) for efficient ultra-long context modeling, Ultra-Sparse MoE with 512 total experts but only 10 routed + 1 shared expert activated (3.7% activation ratio), and native Multi-Token Prediction for faster inference. With 80B total parameters and only ~3B activated per inference step, it achieves performance comparable to Qwen3-32B while using less than 10% training cost and delivering 10x+ throughput for 32K+ contexts. Trained on 15T tokens with training-stability-friendly designs including Zero-Centered RMSNorm and normalized MoE router parameters. Supports 256K context length, extensible to 1M tokens with YaRN scaling.	Sep 10, 2025	Apache 2.0	-	-	-	-	-
#03Qwen3-Next-80B-A3B-Thinking Qwen3-Next-80B-A3B-Thinking is the thinking variant of the Qwen3-Next series, featuring the same groundbreaking architecture as the instruct model. Leveraging GSPO, it addresses stability and efficiency challenges of hybrid attention + high-sparsity MoE in RL training. It uses Hybrid Attention combining Gated DeltaNet and Gated Attention for efficient ultra-long context modeling, High-Sparsity MoE with 512 experts (10 activated + 1 shared), and Multi-Token Prediction. With 80B total parameters and only 3B activated, it demonstrates outstanding performance on complex reasoning tasks — outperforming Qwen3-30B-A3B-Thinking-2507, Qwen3-32B-Thinking, and even the proprietary Gemini-2.5-Flash-Thinking across multiple benchmarks. Architecture: 48 layers, 15T training tokens, hybrid layout of 12(3(Gated DeltaNet->MoE)->(Gated Attention->MoE)). Supports only thinking mode with automatic <think> tag inclusion, may generate longer thinking content.	Sep 10, 2025	Apache 2.0	-	-	-	-	-
#04Qwen3-235B-A22B-Thinking-2507 Qwen3-235B-A22B-Thinking-2507 is a state-of-the-art thinking-enabled Mixture-of-Experts (MoE) model with 235B total parameters (22B activated). It features 94 layers, 128 experts (8 activated), and supports 262K native context length. This version delivers significantly improved reasoning performance, achieving state-of-the-art results among open-source thinking models on logical reasoning, mathematics, science, coding, and academic benchmarks. Key enhancements include markedly better general capabilities (instruction following, tool usage, text generation), enhanced 256K long-context understanding, and increased thinking depth. The model supports only thinking mode with automatic <think> tag inclusion.	Jul 25, 2025	Apache 2.0	-	-	-	-	-
#05Qwen3-235B-A22B-Instruct-2507 Qwen3-235B-A22B-Instruct-2507 is the updated instruct version of Qwen3-235B-A22B featuring significant improvements in general capabilities including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage. It provides substantial gains in long-tail knowledge coverage across multiple languages and markedly better alignment with user preferences in subjective and open-ended tasks.	Jul 22, 2025	Apache 2.0	-	57.3%	-	-	-
#06Qwen3 235B A22B Qwen3 235B A22B is a large language model developed by Alibaba, featuring a Mixture-of-Experts (MoE) architecture with 235 billion total parameters and 22 billion activated parameters. It achieves competitive results in benchmark evaluations of coding, math, general capabilities, and more, compared to other top-tier models.	Apr 29, 2025	Apache 2.0	-	-	-	70.7%	81.4%
#07Qwen3 30B A3B Qwen3-30B-A3B is a smaller Mixture-of-Experts (MoE) model from the Qwen3 series by Alibaba, with 30.5 billion total parameters and 3.3 billion activated parameters. Features hybrid thinking/non-thinking modes, support for 119 languages, and enhanced agent capabilities. It aims to outperform previous models like QwQ-32B while using significantly fewer activated parameters.	Apr 29, 2025	Apache 2.0	-	-	-	62.6%	-
#08Qwen3 32B Qwen3-32B is a large language model from Alibaba's Qwen3 series. It features 32.8 billion parameters, a 128k token context window, support for 119 languages, and hybrid thinking modes allowing switching between deep reasoning and fast responses. It demonstrates strong performance in reasoning, instruction-following, and agent capabilities.	Apr 29, 2025	Apache 2.0	-	-	-	65.7%	-
#09Qwen2.5-Omni-7B Qwen2.5-Omni is the flagship end-to-end multimodal model in the Qwen series. It processes diverse inputs including text, images, audio, and video, delivering real-time streaming responses through text generation and natural speech synthesis using a novel Thinker-Talker architecture.	Mar 27, 2025	Apache 2.0	-	-	78.7%	-	73.2%
#10QwQ-32B A model focused on advancing AI reasoning capabilities, particularly excelling in mathematics and programming. Features deep introspection and self-questioning abilities while having some limitations in language mixing and recursive/endless reasoning patterns.	Mar 5, 2025	Apache 2.0	-	-	-	63.4%	-

Showing 1 to 10 of 24 models

Resources

Official Website