.

Technology

diffusion transformers

Diffusion Transformers (DiT) replace the conventional U-Net backbone in latent diffusion models with a pure Vision Transformer (ViT) architecture, enabling superior image generation scalability and performance.

Diffusion Transformer (DiT) is a scalable generative model architecture developed by William Peebles and Saining Xie (2022). It fundamentally shifts the diffusion process by replacing the standard convolutional U-Net with a Transformer network operating on latent image patches: This design leverages the Transformer's global self-attention mechanism, which is critical for scaling performance. The largest configuration, DiT-XL/2 (675M parameters), achieved a state-of-the-art FID score of 2.27 on the ImageNet 256x256 benchmark, demonstrating that the Transformer is a highly effective, scalable backbone for high-fidelity image synthesis.

https://wpeebles.com/DiT
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects