Technology

ControlNet

ControlNet is a neural network architecture that adds precise spatial conditioning (e.g., Canny edges, OpenPose keypoints) to large, pretrained text-to-image diffusion models like Stable Diffusion.

ControlNet is a game-changer for diffusion models, providing fine-grained control over image composition. It works by creating two copies of the model’s weights: a 'locked' copy to preserve the original Stable Diffusion capabilities and a 'trainable' copy for new conditions. This connection uses 'zero convolutions' to ensure no harmful noise affects the finetuning. This architecture allows training on small datasets (under 50k images) for tasks like pose-to-image or depth-to-image, efficiently steering the output with specific inputs like OpenPose keypoints or Canny edge maps.