Latest Updates

Qwen 3.5, the 100% free Chinese alternative to ChatGPT

March 6, 2026

Qwen 3.5 runs on a desktop PC and outperforms GPT-5 Nano on the majority of benchmarks. All in open source, without spending a cent.

In the race for AI, China remains glued to the Americans’ bumper. Month after month, Chinese laboratories, Moonshot, MiniMax, or DeepSeek, are deploying models at a sustained pace. Their strategy is clearly emerging: compared to American proprietary models, they advocate high-end open source models, whose benchmark results are inexorably approaching the leaders in the sector. Latest example: the AI laboratory of the giant Alibaba has unveiled a new version of its flagship model, Qwen.

Version 3.5, announced in February with a range going from 397 billion parameters in its most massive version up to a 27B version, passing through intermediate models with 122 and 35 billion parameters (the latter, in MoE architecture with 3 billion active parameters, already remains resource-intensive, with more than 22 GB of VRAM required), is enriched at the beginning of March with four new models: 9B, 4B, 2B and 0.8B, designed for inference on PC without excessive hardware configuration. Models which, in addition to running on small machines, manage to match, or even surpass, several proprietary models which were still a benchmark a few months ago. So, should you swap your ChatGPT subscription for an open source Chinese model that runs locally on your machine?

Qwen 3.5, a model designed for local execution

The latest update to Qwen 3.5, on March 2, brings four new models: Qwen 3.5 0.8B, Qwen 3.5 2B, Qwen 3.5 4B and Qwen 3.5 9B. On the architectural side, Qwen is not content with shrinking its giant model. The smaller versions inherit a key innovation from the series: a hybrid attention system that alternates between two mechanisms. Out of four consecutive processing steps, three use “linear attention”, which is much less computationally intensive, and only one uses traditional attention, which is more precise but more costly in resources. Concretely, Alibaba managed to significantly compress the resources necessary to run the model without compromising the quality of its responses.

All published versions are also natively multimodal. Unlike other models which add a vision encoder after the fact, Qwen 3.5 integrates visual understanding from its design: text, images and video are processed within the same neural network, without distinction. However, the model only produces text as output. Context-wise, Qwen 3.5 claims a window of 262,000 tokens natively, the equivalent of a 500-page novel processed in one go. It is even possible to push this limit up to one million tokens (around 2 hours of video) with a slight loss of precision, via YaRN (a technique for mathematically adjusting the context size).

A model above GPT-5 Nano on several benchmarks

The results in the benchmarks are surprising and are the real strength of the model. On vision and multimodal reasoning tasks, Qwen 3.5-9B largely dominates GPT-5 Nano from OpenAI and Gemini 2.5 Flash-Lite from Google. In document comprehension (OmniDocBench), version 9B displays 87.7 compared to 55.9 for the OpenAI model. Same observation in video understanding, spatial intelligence or medical VQA: Alibaba’s small model crushes its proprietary competitors almost across the board.

Even more remarkable on the textual benchmarks: the 9B outperforms GPT-OSS-120B, the open source model of OpenAI which nevertheless weighs… 120 billion parameters, or thirteen times more. This is the case in scientific reasoning (GPQA Diamond: 81.7 against 80.1), in general knowledge (MMLU-Pro: 82.5 against 80.8) or in understanding long contexts (LongBench v2: 55.2 against 48.2). The more modest 4B version also remains above GPT-5 Nano and Gemini Flash-Lite on the majority of vision benchmarks, which makes it a very credible option for the most constrained configurations. Qwen 3.5, however, falls short in code (LiveCodeBench): 9B maxes out at 65.6 compared to 82.7 for GPT-OSS-120B (size still matters for complex code tasks). Same shift in advanced mathematical competitions (HMMT): 83.2 versus 90.0 for the OpenAI model.

Clearly, for classic uses, document analysis, visual reasoning, multilingual understanding, agents Qwen 3.5-9B plays in the big leagues. But for high-level coding and competitive math, heavier models stay ahead.

A simple and easy to install model

Released under the Apache 2.0 license, Qwen 3.5 is completely free and free to use, including for commercial use (and that’s notable). In version Q4 (compression of the model by quantization), the 9B requires approximately 6 GB of VRAM, the 4B approximately 3 GB of VRAM, the 2B approximately 1.5 GB, the 0.8B less than a gigabyte of VRAM. The latter can even be inferred on a recent smartphone, without too much latency.

To test Qwen 3.5 on your machine, the easiest way is LM Studio, available on Windows, macOS and Linux. Once the software is installed, simply type “Qwen3.5” in the search bar, choose the desired version (9B, 4B, 2B or 0.8B) and the quantization level adapted to your configuration (Q6 or Q8 depending on the desired precision), then click on “Download”.

In a few minutes, the model is operational: an integrated chat interface allows you to query it directly, text and images included. No command line, no technical configuration. Everything works offline, your data remains on your machine.

It’s hard not to be impressed by what Alibaba offers with Qwen 3.5. A multimodal, open source model, which runs on a desktop PC and stands up to proprietary models thirteen times heavier. For developers, companies keen to keep their data local or simply the curious who want to test AI without taking out the credit card, the proposal is difficult to ignore. The fact remains that the Chinese open source AI landscape is evolving at a speed that makes any prediction risky. DeepSeek is preparing a V4 which could once again shake things up in the coming days…

Jake Thompson

Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.