Seed Diffusion
Next-Gen Code Generation: A Revolution in Speed & Quality
Seed Diffusion Preview is an experimental discrete diffusion language model that elevates inference speed to a new level through non-sequential, parallel generation, all while maintaining top-tier code quality.
2,146
Tokens/Second Inference Speed
Measured performance on H20 GPUs
5.4x
Speed Increase
Compared to autoregressive models of similar scale
Key Technologies for Accelerating Diffusion Models
To address the efficiency and logic challenges of traditional diffusion models, Seed Diffusion introduces three core innovations. Click the tabs below to explore the principles of each technology.
Training Process Evolution
Stage 1: Pattern Filling
def calculate(a, b):
return a [MASK] b
The model learns to fill `[MASK]`, mastering local syntax and patterns.
Stage 2: Logical Editing
def calculate(a, b):
def add(a, b):
return a + b
Perturbation via insertion/deletion forces the model to review global logic and make corrections.
From Pattern Filling to Logical Editing
Traditional mask-based models can only "fill in the blanks" and cannot correct errors. To address this, a two-stage learning strategy was designed:
- 1 Mask-based Training: Teaches the model local context and code patterns, but creates a "spurious correlation" by blindly trusting unmasked parts.
- 2 Edit-based Training: Perturbs data with insertions/deletions, forcing the model to re-evaluate the global validity of all code, breaking the "spurious correlation" and significantly improving code logic comprehension and repair capabilities.
Performance Deep Dive
Seed Diffusion demonstrates performance comparable to or even exceeding top autoregressive models on several core code benchmarks, while being significantly faster. Use the dropdown to compare performance on different benchmarks.
From Theory to Efficient Engineering
Beyond algorithmic innovations, comprehensive system-level optimizations are key to achieving high performance.
Block-wise Parallel Sampling
To balance computation and latency, the model employs a block-wise parallel sampling scheme. This method maintains a causal order between blocks and uses KV-caching to reuse information from previously generated blocks, improving efficiency while maintaining quality.
System-level Optimization
The project leverages an in-house infrastructure framework specially optimized for diffusion sampling to comprehensively optimize block-wise inference. The chart below shows the impact of different block sizes on inference time, providing a basis for selecting the optimal block size.