⚡️

Seed Diffusion

Next-Gen Code Generation: A Revolution in Speed & Quality

Seed Diffusion Preview is an experimental discrete diffusion language model that elevates inference speed to a new level through non-sequential, parallel generation, all while maintaining top-tier code quality.

2,146

Tokens/Second Inference Speed

Measured performance on H20 GPUs

5.4x

Speed Increase

Compared to autoregressive models of similar scale

Key Technologies for Accelerating Diffusion Models

To address the efficiency and logic challenges of traditional diffusion models, Seed Diffusion introduces three core innovations. Click the tabs below to explore the principles of each technology.

Training Process Evolution

Stage 1: Pattern Filling

def calculate(a, b):
return a [MASK] b

The model learns to fill `[MASK]`, mastering local syntax and patterns.

↓

Stage 2: Logical Editing

~~def calculate(a, b):~~
def add(a, b):
return a + b

Perturbation via insertion/deletion forces the model to review global logic and make corrections.

From Pattern Filling to Logical Editing

Traditional mask-based models can only "fill in the blanks" and cannot correct errors. To address this, a two-stage learning strategy was designed:

1
Mask-based Training: Teaches the model local context and code patterns, but creates a "spurious correlation" by blindly trusting unmasked parts.
2
Edit-based Training: Perturbs data with insertions/deletions, forcing the model to re-evaluate the global validity of all code, breaking the "spurious correlation" and significantly improving code logic comprehension and repair capabilities.

Performance Deep Dive

Seed Diffusion demonstrates performance comparable to or even exceeding top autoregressive models on several core code benchmarks, while being significantly faster. Use the dropdown to compare performance on different benchmarks.

From Theory to Efficient Engineering

Beyond algorithmic innovations, comprehensive system-level optimizations are key to achieving high performance.

Block-wise Parallel Sampling

To balance computation and latency, the model employs a block-wise parallel sampling scheme. This method maintains a causal order between blocks and uses KV-caching to reuse information from previously generated blocks, improving efficiency while maintaining quality.

System-level Optimization

The project leverages an in-house infrastructure framework specially optimized for diffusion sampling to comprehensively optimize block-wise inference. The chart below shows the impact of different block sizes on inference time, providing a basis for selecting the optimal block size.