Qwen-Image-Layered

AI Image Editor is getting closer to Photoshop

AI Image Editor can now layer images! Open-sourced by Alibaba Qwen Team. Decomposes a single RGB image into multiple semantically independent RGBA layers, enabling lossless, high-consistency end-to-end image editing.

Layer Decomposition Demo
Layer Decomposition Example

Input an RGB image, output multiple independent layers with alpha channels.

Redefining Image Generation & Editing

Qwen-Image-Layered is not just a generative model. By simulating the layer logic of professional design software, it solves the consistency pain points of traditional diffusion models.

Image Decomposition (I2L)

Decomposes ordinary images into semantically meaningful RGBA layers. Supports precise separation of complex scenes, text, and semi-transparent objects.

Variable & Recursive Layers

Flexible layer count (3, 8, etc.). Supports recursive decomposition of single layers for infinite hierarchical control.

Inherent Editability

Each layer can be independently moved, scaled, deleted, or recolored. Physical isolation ensures background consistency during edits.

Text to Layered Image (T2L)

Supports generating multi-layered images directly from text prompts, providing ready-to-use layered assets for creative design.

Seamless Integration

Perfectly integrated with Qwen-Image-Edit for advanced inpainting and replacement operations on specific layers.

Fully Open Source

Apache 2.0 license. Model weights, codebase, and paper are all public. Available on Hugging Face and ModelScope.

Real-world Applications

Precise Semantic Decomposition

The example above shows decomposition of complex images: input on left, multiple RGBA layers (with transparency) on right, separating semantic elements like background, objects, text.

Decomposition 1
Decomposition 2

Technical Highlights

RGBA-VAE

Designed a Variational Autoencoder unifying RGB and RGBA to build a compatible latent space representation for transparency.

VLD-MMDiT

Based on Qwen2.5-VL, adopts a Diffusion Transformer architecture supporting Variable-Layer counts.

Multi-stage Training

Starting from large-scale pre-trained image generation models, fine-tuned with carefully designed strategies for multi-layer decomposition.

High-quality Data

Built datasets using real PSD files, ensuring the model handles real-world challenges like semi-transparent occlusion and complex layout.

Quick Start

View Full Docs
Python Example
from diffusers import QwenImageLayeredPipeline
import torch
from PIL import Image

# 1. Load model
pipeline = QwenImageLayeredPipeline.from_pretrained("Qwen/Qwen-Image-Layered")
pipeline = pipeline.to("cuda", torch.bfloat16)

# 2. Prepare image
image = Image.open("test.png").convert("RGBA")

# 3. Set params & infer
inputs = {
    "image": image,
    "layers": 4,              # Specify 4 layers
    "num_inference_steps": 50,
    "resolution": 640,
    "true_cfg_scale": 4.0,
}

with torch.inference_mode():
    output = pipeline(**inputs)
    layers = output.images[0] # Returns list of layers

# 4. Save layers
for i, layer in enumerate(layers):
    layer.save(f"layer_{i}.png")

Requires transformers>=4.51.3 and latest diffusers

pip install git+https://github.com/huggingface/diffusers python-pptx