Qwen VLo: 从“看懂”世界到“描绘”世界

介绍

多模态大模型的演进正在不断突破我们对技术边界的认知。从最初的 QwenVL 到如今的 Qwen2.5 VL ，我们在提升模型对图像内容的理解能力方面取得了一些进展。今天，我们正式推出 Qwen VLo ——一个多模态统一理解与生成模型。这一全新升级的模型不仅能够“看懂”世界，更能基于理解进行高质量的再创造，真正实现了从感知到生成的跨越。需要注意的是，这是一款预览版本，您可以通过 Qwen Chat 访问它。您可以直接发送类似“生成一张可爱猫咪的图片”的提示来生成图像，或者上传一张猫咪的图片并要求“给猫咪头上加顶帽子”来修改图像。图像的生成过程如下所示:

生成过程：发挥你想象力，将你的想法变成现实

正如视频中展示的生成过程，Qwen VLo 以一种渐进式生成方式，从左到右、从上到下逐步清晰地构建整幅图片。在生成过程中，模型会对预测的内容不断调整和优化，从而确保最终结果更加和谐一致。这种生成机制不仅提升了视觉效果，还为用户带来了更灵活、更可控的创作体验。

从理解到创造：更精准的多模态生成能力

Qwen VLo在原始多模态理解与生成能力上进行了全面升级，显著增强了对图像内容的理解深度，并在此基础上实现了更加准确和一致的生成效果。以下是 Qwen VLo 的核心亮点：

更精准的内容理解与再创造

以往的多模态模型在生成过程中容易出现语义不一致的问题，例如将汽车误生成其他类型的物体，或者无法保留原图的关键结构特征。而 Qwen VLo 通过更强大的细节捕捉能力，能够在生成过程中保持高度的语义一致性。例如，当用户输入一张汽车的照片并要求“更换颜色”时，Qwen VLo 不仅能准确识别车型，还能保留其原有的结构特征，同时完成色彩风格的自然转换，让生成结果既符合预期又不失真实感。

支持开放指令编辑修改生成

用户可以通过自然语言提出各种创意性指令，如“将这张画风改为梵高风格”、“让这张照片看起来像19世纪的老照片”或“给这张图片添加一个晴朗的天空”。Qwen VLo 能够灵活响应这些开放性指令，并生成符合用户预期的结果。无论是艺术风格迁移、场景重构还是细节修饰，模型都能轻松应对。甚至一些传统的视觉感知人物如预测深度图、分割图、检测图以及边缘信息等也可以通过编辑指令轻松完成。更进一步，像很多更复杂的指令，比如一条指令中同时包含修改物体、修改文字、更换背景，模型也能轻松完成。

多语言指令支持

Qwen VLo 支持包括中文、英文在内的多种语言指令，打破了语言壁垒，为全球用户提供了统一且便捷的交互体验。无论您使用哪种语言，只需简单描述您的需求，模型便能快速理解并输出理想结果。

样例

Qwen VLo 更像一个人类画师, 根据自己的理解再进行创作. 下面是一些具体的例子。

Qwen VLo 能够直接生成图像，并对其进行修改，例如替换背景、添加主体、进行风格迁移，甚至可以完成基于开放指令的大幅修改，包括检测和分割等视觉感知任务。

A cute Shiba Inu Next

User

生成一个可爱的柴犬

Translation: Generate a cute Shiba Inu

Qwen-VLo

User

背景改成草原

Translation: Change the background to a grassland

Qwen-VLo

User

给它带上红色帽子和黑色透明墨镜,帽子上写着“QwenVLo”

Translation: Put a red hat and black transparent sunglasses on it, with ‘QwenVLo’ written on the hat

Qwen-VLo

User

变成吉卜力风格

Translation: Switch to Ghibli style

Qwen-VLo

User

变成3d Q版风格

Translation: Switch to 3D Q-version style

Qwen-VLo

User

把它放到水晶球里

Translation: Place it inside a crystal ball

Qwen-VLo

User

桌面上摆着这个水晶球，生成以一个人的第一视角在公园的圆形咖啡桌上在笔记本上画画

Translation: Place this crystal ball on a desk and generate an image from a first-person perspective of someone drawing on a notebook placed on a round coffee table in a park

Qwen-VLo

User

用蓝色的蒙版检测框框出图中的笔

Translation: Use a blue mask to detect and frame the pen in the picture

Qwen-VLo

User

用粉色的mask分割出图中的狗狗边缘

Translation: Use a pink mask to segment the edge of the dog in the picture

Qwen-VLo

Qwen VLo 会根据自己的理解进行重新创作，这意味着在风格转换和迁移方面拥有更大的发挥空间，比如将卡通变为写实、将形象变成气球等有趣的生成效果。

Style Conversion Next

User

变成真实照片
Translation: Turn into a real photo

Qwen-VLo

User

背景换成艾弗尔铁塔
Translation: Change the background to the Eiffel Tower

Qwen-VLo

User

变成气球飘到空中
Translation: Turn into a balloon floating in the air

Qwen-VLo

User

把西瓜换成榴莲
Translation: Replace the watermelon with durian

Qwen-VLo

Style Conversion Next

User

Convert the couple in the photo into a minimalist flat illustration style Q-version sticker, retaining the facial features, with thick white borders, figures extending beyond the circular area, the circular area filled with a solid color, transparent background, and an overall cute style.

Qwen-VLo

User

Convert the couple in the photo into a detailed, exquisite, and adorable 3D rendered collectible figurine, placed inside a cozy and romantic display box decorated with light cream and soft golden accents, while preserving the characteristics of the snowy scene and the figures’ poses.

Qwen-VLo

User

Add the text “Happy Wedding”

Qwen-VLo

Style Conversion Next

User

变成吉卜力风格

Translation: Turn into Ghibli style

Qwen-VLo

User

变成海贼王画风

Translation: Turn into One Piece art style

Qwen-VLo

User

变成龙珠画风

Translation: Turn into Dragon Ball art style

Qwen-VLo

User

变成海绵宝宝风格

Translation: Turn into SpongeBob style

Qwen-VLo

User

变成我的世界里的角色

Translation: Turn into a character from Minecraft

Qwen-VLo

User

变成像素画风格

Translation: Turn into pixel art style

Qwen-VLo

Style Conversion Next

User

将照片中的猫转换为皮克斯 3D 风格，保持蓝眼睛和白橙色毛发特征，背景为红色树叶的秋日氛围。

Translation: Transform the cat in the photo into Pixar 3D style, maintaining the features of blue eyes and white-orange fur, with the autumn atmosphere of red leaves in the background.

Qwen-VLo

Style Conversion Next

User

将迈阿密夜景的繁华城市天际线转换为果冻质感（布丁造型）的特殊材质建模风格，主题为现代都市，添加透明果冻层叠加的效果和微妙的弹性物理模拟。

Translation: Transform the bustling city skyline of Miami’s night view into a special material modeling style with a jelly-like texture (pudding shape), with the theme of modern city. Add the effect of transparent jelly layer superimposition and subtle elastic physical simulation.

Qwen-VLo

Style Conversion Next

User

将照片中的女性主体转换为细致精美的3D渲染收藏摆件，装置在浅奶油色与柔和金色装饰的温馨展示盒中，保持优雅姿态与服饰细节，采用萌趣可爱的风格表现。

Translation: Convert the female subject in the photo to a detailed and exquisite 3D rendered collectible figure, displayed in a warm and cozy display box with a shallow cream color and soft golden decoration, maintaining an elegant pose and clothing details, presented in a cute and adorable style.

Qwen-VLo

Style Conversion Next

User

将铁王座转换为Jellycat毛绒玩具风格，保持剑刃纹理细节，采用柔和的毛绒材质渲染。

Translation: Transform the Iron Throne into the Jellycat plush toy style, retain the texture details of the sword blade, and use soft plush material for rendering.

Qwen-VLo

Style Conversion Next

User

生成一条可爱的锦鲤，背景是西湖雷峰塔

Translation: Generate a cute koi with the Leifeng Pagoda in West Lake as the background

Qwen-VLo

User

把图片转化为乐高人偶风格，模块化积木造型，保持鱼的廓形，采用简洁的背景突出主体，添加柔光渲染效果

Translation: Transform the picture into a Lego figure style, with modular building block shapes, maintain the outline of the fish, use a simple background to highlight the main subject, and add soft light rendering effects

Qwen-VLo

Style Conversion Next

User

将照片中的红色跑车转换为细致精美的3D渲染收藏摆件，装置在浅奶油色与柔和金色装饰的粉彩展示盒中，保持跑车的红色条纹特征，采用萌趣可爱风格。

Translation: Transform the red sports car in the photo into a detailed and exquisite 3D rendered collectible ornament, installed in a pastel display box decorated with light cream and soft gold, maintaining the red stripe feature of the sports car and adopting a cute and playful style.

Qwen-VLo

User

把图中的跑车变成毛绒玩具

Translation: Turn the car in the picture into a plush toy.

Qwen-VLo

User

再把跑车变成奶油布丁风格的

Translation: Turn the car into a cream pudding style.

Qwen-VLo

Qwen VLo 在图像与指令理解上的优势使其能够更好地解析复杂指令，一条指令中可以包含多个操作和修改，从而一次性完成多重任务，例如生成海报、组合物体等。

Complex Image Prompt Next

User

Generate a photo in which a man wearing a black hat is reading a newspaper on the subway, next to him is a beautiful young woman wearing red sunglasses, and there’s also a Husky dog. The subway window shows the Statue of Liberty, and the subway station sign displays ‘Qwen VLo’.

(翻译：生成照片，图中男人带着黑帽子在地铁上看报纸，旁边是一个美丽的带红色墨镜年轻女性，还有一只哈士奇，地铁的窗外是自由女神像，地铁的站牌显示“Qwen VLo”)

Qwen-VLo

Complex Image Prompt Next

User

now generate a POV of a person drawing this diagram in their notebook, at a round cafe table in park

Qwen-VLo

User

now show the same scene with a smug young Isaac Newton sitting at the table, with a prism, demonstrating the experiment, without the notebook in view

Qwen-VLo

Poster Your Cat Next

User

生成一只黑色的小猫

Translation: Generate a black kitten

Qwen-VLo

User

生成一个以这只小黑猫为主体的广角巨幅摄影，主角小黑猫骄傲地站在中间，戴着一顶鲜红色的帽子，为它可爱的外表增添了一丝奇思妙想。在它的左边，动物们成群结队，老虎、狮子、猴子、鹿和天上的飞鸟，它们的眼睛闪烁着凶猛的光芒，散发出一种不可战胜的气势。在右边是一只外星人，还有一个发出耀眼的深蓝色光的飞碟。飞碟发出眩目的光束，与动物大军形成鲜明对比。广角镜头捕捉到了壮观的场景，突出了小猫的勇敢和双方之间的紧张关系。

Translation: Generate a wide-angle large-scale photograph with this little black cat as the main subject. The protagonist, the little black cat, stands proudly in the middle, wearing a bright red hat, adding a touch of ingenuity to its cute appearance. On its left, there are groups of animals, tigers, lions, monkeys, deer and birds in the sky. Their eyes sparkle with fierce light and exude an invincible momentum. On the right is an alien and a flying saucer emitting dazzling deep blue light. The flying saucer emitted dazzling beams of light, in sharp contrast to the army of animals. The wide-angle lens captured the spectacular scene, highlighting the kitten’s bravery and the tension between the two sides.

Qwen-VLo

Poster Generation Next

User

生成一个关于这个咖啡的宣传海报，咖啡罐是主体，放在木质桌面上，有一种自然复古感。咖啡罐打开，上方悬着一个正在向下倒水的手冲壶，咖啡向外泼溅，散发出袅袅热气。主色调用深棕/咖色（象征咖啡）搭配奶白/浅金（象征奶泡），标题是奶泡风格的手写体“一杯咖啡，唤醒你的清晨”

Translation: Create a promotional poster for this coffee. The main subject is a coffee jar placed on a wooden tabletop, giving a natural and vintage feel. The jar is open, with a hand-pouring kettle suspended above it, pouring water downward. Coffee beans are splashing out and steam is rising. The color scheme features deep brown/sepia (symbolizing coffee) paired with creamy white/light gold (representing milk foam). The title is in milk foam-style handwriting: ‘A Cup of Coffee to Wake Up Your Morning.’

Qwen-VLo

User

Set the object in the image as the main visual focus, placing it centrally in the frame. The background should have a dark blue tone. The desk lamp is turned on, with light gradually diffusing outward, creating a warm orange-yellow halo effect. In front of the lamp base, place a dark wood desk surface with a few pages of an open magazine and a pair of sunglasses scattered on it. The overall style leans towards a vintage modern aesthetic, with strong contrasts in light and shadow. The color scheme creates tension between dark blue and orange. The title should be placed in the upper right corner of the image, with the text: “Illuminating not just the desk, but an attitude.”

翻译：将图像中的主体设为视觉焦点，置于画面中央。背景采用深蓝色调。台灯处于开启状态，光线逐渐向外扩散，营造出温暖的橙黄色光晕效果。在灯座前方，放置深色木制桌面，桌面上散落着几页打开的杂志和一副太阳镜。整体风格偏向复古现代风，光影对比强烈。色彩搭配上，深蓝色与橙色形成张力。标题位于图像右上角，文字为：“Illuminating not just the desk, but an attitude.”（照明不仅是书桌，更是一种态度。）

Qwen-VLo

User

一、核心视觉元素

主体强化保留高清苹果主体（占比40%-50%画面）增加苹果切面展示：横向剖面（5片星形果核）+纵向剖面（果肉纤维特写）水晶质感：在苹果表面添加1-2滴写实水珠（直径约苹果1/8高度）自然生态元素动态飘落的苹果花花瓣（3-5片，浅粉色渐变）蜜蜂停留特写（透明翅膀动态效果）老树根局部肌理（右下角虚化处理）二、信息架构系统

标题层级主标题（24pt）：「咬一口阳光的味道」副标题（14pt）：「来自海拔1200米的冰川矿物土壤」

Qwen-VLo

Qwen VLo 除了能对图像的编辑和再创作，还可以完成一些对已有信息的标注，比如检测、分割、边缘检测等。

Perception and Localization Next

User

生成摆满水果的桌面
Translation: Generate a table full of fruits

Qwen-VLo

User

预测边缘检测图
Translation: Predict an edge detection map

Qwen-VLo

User

用红色mask分割图中香蕉的边缘
Translation: Use a red mask to segment the edges of the banana in the image.

Qwen-VLo

Qwen VLo 可以支持多张图像的输入理解和生成。(多图输入的功能还没有正式上线，敬请期待。)

Multiple image input Next

User

把这些洗浴用品，放到这个红色的篮子里面

Translation: Put these bathing products into this red basket.

Qwen-VLo

当然除了图文同时输入的情况，Qwen VLo 也支持文本到图像的直接生成，包括通用图像和中英文海报等。

Text to Image Next

User

Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says “Qwen VLo!” made out of colorful energy'

Qwen-VLo

Text to Image Next

User

Fisheye lens, an orange cat wearing sunglasses, a scarf, and fuzzy ear muffs stands on the snow smiling while taking a selfie. In the background are continuous snow-capped mountains, and a small yellow rubber duck toy peeks out from its backpack.

Qwen-VLo

Text to Image Next

User

米开朗基罗头像画面凹版印刷，黑白图片，精致的线条细致线条刻画细节，细致的细节，白色背景，高清画质。

Qwen-VLo

Text to Image Next

User

A lone astronaut floats inside a vast space station, painting swirling galaxies onto a massive canvas that hangs weightlessly in the air. Their paintbrush leaves behind trails of cosmic dust, and their suit is stained with nebula-colored hues. Their helmet is off, revealing eyes filled with the reflection of distant planets. Outside the glass window, a black hole looms, twisting light into mesmerizing patterns.

Qwen-VLo

Text to Image Next

User

Typographic art, a road, a road with a text “2025”, a tiny silhouette of a man waving, a tiny man dragging a suitcase, soft focus effect, new Chinese style, ink lines, wet ink smudges, dark red sprinkled with gold foil background, commercial illustration, rock color, the title is centered with a large handwritten graffiti “A New Beginning”, impact, scribbled handwriting, artistic composition

Qwen-VLo

Text to Image Next

User

国风海报设计，文字排版艺术，一条路，道路劲头涂鸦“2025”，一个极小的小人剪影挥手，一个极小的人拖着行李箱，柔焦效果，新中式，水墨线条，湿墨晕染，暗红色洒金箔背景，商业插画，岩彩，标题居中手写大大的涂鸦“新的开始”，冲击力，潦草字迹，艺术构图

Qwen-VLo

Text to Image Next

User

国风绘画，电影海报，意境，写意，设计感，画面下方是小小的一个骑马奔跑的唐朝官员剪影，红色背景上隐约可见的荔枝纹理，穿插几片绿叶，狂草书法字“长安”，排版设计

Qwen-VLo

Text to Image Next

User

Crayon scraper painting, large color blocks, smudges, freehand brushwork, on a golden hill, a father and his daughter are running, slanted, the sky is a deep and dark dreamy star track, serif font headline “Father’s Day”.

Qwen-VLo

Qwen VLo 支持动态长宽比的图像生成，对于长宽比高达 4:1 ，1:3 等细长类型图像也能轻松掌握。(极端长宽比图像生成功能还没有正式上线，敬请期待。)

Text to Image Next

User

动漫插画；水彩手绘；前景是草坡，草坡上有个人在奔跑，动态感，然后是厚重的白云；蓝色背景；颜色层次多渐变；过渡自然和谐

Translation: Anime illustration; watercolor hand-drawn style; the foreground features a grassy hill where a person is running, conveying a sense of motion. Behind are thick, billowing clouds; blue background; multi-layered color gradients with smooth and harmonious transitions.

Qwen-VLo

Text to Image Next

User

大海中快速行驶微小的船，光学调色风格，鸟瞰大图，彩虹渐变海平面，一半海水一半油画，国家地理照片，8K分辨率，蜡笔涂抹艺术，互动艺术品

Translation: A tiny boat swiftly sailing across the vast ocean, optical color grading style, aerial view, rainbow gradient horizon, half sea and half oil painting texture, National Geographic photo style, 8K resolution, crayon brushstroke art, interactive artwork, no watermark or logo.

Qwen-VLo

Text to Image Next

User

丛林时尚标题“Qwen VLo”，flowers，这是一幅植物插画，前景描绘了盛开的仙人掌。背景中充满了各种植物，包括沙漠植物和一些栖息在其中的小鸟。这种设计适合自然主题的封面或横幅

Translation: Jungle Chic, titled “Qwen VLo”, flowers. This is a botanical illustration where the foreground depicts blooming cacti. The background is filled with various plants, including desert flora and small birds inhabiting them. This design is suitable for nature-themed covers or banners.

Qwen-VLo

Text to Image Next

User

这是一幅充满禅意的金箔彩画，细腻地描绘了一位渔翁静立于芦苇丛中的场景。画面中，微风轻拂，芦苇摇曳，光影交错之间展现出强烈的对比效果。明亮的部分以金箔点缀，闪耀着温暖而神圣的光辉，而暗部则通过深邃的阴影营造出神秘与宁静的氛围。整体画面传递出一种超然物外的空灵之感，仿佛将观者带入一个远离尘世喧嚣、唯有自然与内心对话的静谧世界。

Translation: This is a Zen-inspired colorful gold-leaf painting that delicately depicts an old fisherman standing quietly among reeds. In the scene, a gentle breeze sways the reeds, creating a play of light and shadow that reveals strong contrast. Bright areas are accented with gold leaf, glowing with warm and sacred radiance, while darker parts use deep shadows to evoke a sense of mystery and tranquility. The overall image conveys a transcendent ethereal quality, as if drawing viewers into a serene world far removed from worldly noise — a place of quiet communion between nature and the inner self.

Qwen-VLo

Text to Image Next

User

一幅壮丽的自然风景画，展现巍峨的雪山耸立于远方，山峰覆盖着洁白的积雪，反射出耀眼的光芒。雪山脚下是一片辽阔的草原，有一个人在骑马，绿草如茵，点缀着五彩斑斓的野花。一条蜿蜒的河流从草原中穿过，河水清澈见底，在阳光下泛起粼粼波光。天空湛蓝，漂浮着几缕白云，与远处的雪山相映成趣。整个画面充满宁静与宏伟的气息，让人感受到大自然的壮美与生命力。超高清细节，广角视角，写实风格，光影效果强烈，8K分辨率。

Translation: A magnificent natural landscape painting showcasing majestic snow-capped mountains standing in the distance, their peaks covered in pure white snow reflecting dazzling light. At the foot of the mountains lies a vast grassland where a rider gallops across the terrain—lush green meadows dotted with vibrant wildflowers. A winding river flows through the plains, its crystal-clear waters shimmering under the sunlight. The sky is a brilliant blue, scattered with a few drifting white clouds that harmonize beautifully with the distant snowy peaks. The entire scene exudes a sense of serenity and grandeur, evoking the awe-inspiring beauty and vitality of nature. Ultra-high-definition details, wide-angle perspective, realistic style, intense lighting effects, 8K resolution.

Qwen-VLo

作为统一的理解与生成模型，Qwen VLo 还可以对生成的内容进行再分析和理解，例如识别生成图片中的狗和猫的品种。

Generation and Understanding Next

User

Generate a puppy and a kitten.

Qwen-VLo

User

What breed of cat and dog is this?

Qwen-VLo

The image shows a puppy and a kitten interacting in a cozy indoor setting. Let’s identify the breeds:

Dog:

The puppy appears to be a Beagle or a Beagle mix. Key characteristics include:
- Tri-color coat (brown, white, and black).
- Floppy ears.
- Compact body structure.
- Expressive eyes and a friendly demeanor.

Cat:

The kitten appears to be a Tabby cat. Key characteristics include:
- Striped or spotted fur pattern, which is typical of tabby cats.
- A mix of brown, black, and white fur, which is common in tabby patterns.

Both animals are young and appear to be very playful and curious, which is typical behavior for puppies and kittens. The interaction between them suggests a friendly and affectionate relationship, which is often seen when different species bond early in life.

如何使用

Qwen VLo 采用动态分辨率训练，支持动态分辨率生成。无论是输入端还是输出端，模型都支持任意分辨率和长宽比的图像生成。这意味着用户不再受限于固定的格式，可以根据实际需求生成适配不同场景的图像内容，无论是海报、插图、网页 Banner 还是社交媒体封面，都能轻松应对。

此外，Qwen VLo 还创新性地引入了一种全新的生成机制：从上到下、从左到右逐步清晰的生成过程这一机制不仅提升了生成效率，还特别适用于需要精细控制的长段落文字生成任务。例如，在生成带有大量文本的广告设计或漫画分镜时，Qwen VLo 逐步生成慢慢修改。这种渐进式的生成方式让用户可以实时观察生成过程，并根据需要进行调整，从而获得最佳的创作效果。

局限性

Qwen VLo 还属于预览阶段，有很多不足的地方，在生成的过程可能存在不符合事实、不完全和原图一致、指令不遵循、在识别生图和理解的意图不够稳定的问题，还请谅解。我们会持续迭代，不断提升模型的稳定性和鲁棒性。

下一步：用图像表达想法，用生成促进理解

随着多模态大模型逐渐具备视觉与文本的双向输入输出能力，我们也开启了一种全新的表达和交互方式。未来，模型不仅可以用文本回答问题，还可以用图像来传递想法和含义。例如，生成示意图、添加辅助线、标注关键区域等功能，都将为用户提供更多元化的交流手段。

与此同时，具备输出能力的多模态模型也为我们提供了新的监督方式。通过生成任务，我们可以更好地帮助模型理解世界。例如，模型可以通过生成分割图、检测图等中间结果来验证自身的理解是否正确，从而进一步提升性能。这将是我们在未来持续关注和探索的方向。

We have a new blog!View this page at qwen.ai.

Qwen VLo: 从“看懂”世界到“描绘”世界

介绍#

从理解到创造：更精准的多模态生成能力#

样例#

Dog:#

Cat:#

如何使用#

局限性#

下一步：用图像表达想法，用生成促进理解#

We have a new blog!
View this page at qwen.ai.

介绍

从理解到创造：更精准的多模态生成能力

样例

Dog:

Cat:

如何使用

局限性

下一步：用图像表达想法，用生成促进理解