Alibaba’s Qwen 3.5 Small Models Democratize AI Agentics
Despite ongoing developments in the U.S. AI sector, China continues to make significant strides in artificial intelligence. Today, Alibaba’s Qwen Team unveiled its newest batch of models, the Qwen3.5 Small Model Series, designed for efficiency and accessibility. This series consists of Qwen3.5-0.8B & 2B, Qwen3.5-4B and Qwen3.5-9B.
Introducing the Qwen3.5 Small Model Series
The Qwen3.5 Small Model Series is tailored for a range of applications, from prototyping on edge devices to powering lightweight agents. The models include:
- Qwen3.5-0.8B & 2B: Optimized for “tiny” and “fast” performance, ideal for prototyping and deployment on battery-powered edge devices.
- Qwen3.5-4B: A strong multimodal base for lightweight agents, supporting a 262,144 token context window.
- Qwen3.5-9B: A compact reasoning model that outperforms OpenAI’s gpt-oss-120B on multilingual knowledge and graduate-level reasoning benchmarks.
These models are comparable in size to those from MIT offshoot LiquidAI’s LFM2 series, differing significantly from the larger models used by OpenAI, Anthropic, and Google’s Gemini.
The model weights are available under the Apache 2.0 license on Hugging Face and ModelScope, enabling commercial and enterprise apply.
Technical Foundations: Hybrid Efficiency and Native Multimodality
The Qwen3.5 small series utilizes an Efficient Hybrid Architecture combining Gated Delta Networks (a form of linear attention) with sparse Mixture-of-Experts (MoE). This approach addresses the limitations of small models by achieving higher throughput and lower latency during inference.
Unlike previous generations, Qwen3.5 was trained using early fusion on multimodal tokens, enabling the 4B and 9B models to exhibit a level of visual understanding—such as reading UI elements or counting objects in a video—that previously required larger models.
Benchmarking Performance
Newly released benchmark data demonstrates the competitive performance of these compact models against larger industry standards:
- Multimodal Reasoning: Qwen3.5-9B achieved a score of 70.1 on the MMMU-Pro benchmark, surpassing Gemini 2.5 Flash-Lite (59.7) and Qwen3-VL-30B-A3B (63.0).
- Graduate-Level Reasoning: The 9B model reached a score of 81.7 on the GPQA Diamond benchmark, exceeding gpt-oss-120b (80.1).
- Video Understanding: Qwen3.5-9B scored 84.5 and the 4B scored 83.5 on the Video-MME benchmark, significantly outperforming Gemini 2.5 Flash-Lite (74.6).
- Mathematical Prowess: The 9B model scored 83.2 and the 4B variant scored 74.0 on the HMMT Feb 2025 evaluation.
- Document & Multilingual Knowledge: The 9B variant achieved a score of 87.7 on OmniDocBench v1.5 and 81.2 on MMMLU, outperforming gpt-oss-120b (78.2).
Community Reaction: “More Intelligence, Less Compute”
The release of the Qwen3.5 Small Model Series has sparked interest among developers focused on “local-first” AI. The models’ smaller footprint and processing requirements resonate with users seeking alternatives to cloud-based models.
AI educator Paul Couvert noted the efficiency leap, stating that the 4B version is almost as capable as the previous 80B A3B model, and the 9B model is comparable to GPT OSS 120b while being 13x smaller. Developers like Karan Kendre have highlighted the ability to run these models locally on devices like M1 MacBook Airs.
Researchers have also praised the release of Base models alongside the Instruct versions, providing a “blank slate” for customization without the biases of specific RLHF or SFT data.
Licensing and Commercial Use
Alibaba has released the weights and configuration files under the Apache 2.0 license, allowing for commercial use, modification, and distribution without royalty payments.
The Shift to Agentic AI
The Qwen3.5 series arrives at a time when the focus is shifting from simple chatbots to autonomous agents. These models, with their reasoning, multimodality, and tool use capabilities, can perform tasks for a fraction of the cost of larger models.
Strategic Enterprise Applications
The 0.8B to 9B models are designed for efficiency, activating only the necessary parts of the network for each task. Potential applications include:
- Visual Workflow Automation: Navigating desktop or mobile UIs, filling out forms, and organizing files.
- Complex Document Parsing: Extracting structured data from diverse forms and charts.
- Autonomous Coding & Refactoring: Refactoring or debugging code repositories.
- Real-Time Edge Analysis: Offline video summarization and spatial reasoning on mobile devices.
Operational Considerations
While highly capable, teams should monitor for potential issues such as the “hallucination cascade” in multi-step workflows, challenges with debugging complex code, and VRAM demands. Data residency concerns may also arise due to the models originating from a China-based provider.