Alibaba's powerful multimodal model Qwen3-VL
12:48, 15.10.2025
The new Qwen3-VL model was recently released by Alibaba. It is available in two versions and works with images and text, supporting content of up to 256,000 tokens, with the length extendable to 1 million.
Key features of Qwen3-VL
Open vocabulary support is a key feature of the new model. Qwen3-VL recognizes many details such as architectural objects, logos, consumer goods, and much more. Therefore, it is possible not only to analyze but also to interpret context.
The project offers two main modes of operation: Thinking and Instruct. Thinking is used for more complex computational tasks that require a step-by-step process. Instruct is used to generate interactive methods such as code, text, or simple data analysis.
The OCR system is trained on poor-quality scanned information. The model can easily recognize data from tilted or slightly blurred scans and supports 32 languages.
Qwen3-VL is available under the Apache 2.0 license, making it the most accessible and powerful open source option. The code is already available on Hugging Face, and integration of the model with ModelScope and AI Workspace services is soon to be expected.