.

Technology

Imagen Video

Google Research's high-definition video generation system uses cascaded diffusion models to transform text prompts into 1280x768 resolution clips.

Imagen Video converts text into high-fidelity video through a sequence of seven cascaded diffusion models. The process begins with a base 64x40 resolution model (3 fps) and scales up to 1280x768 at 24 fps using spatial and temporal super-resolution. It leverages the T5-XXL text encoder (4.6 billion parameters) to ensure precise alignment with complex prompts: including specific artistic styles like oil paintings or 3D renders. This architecture maintains temporal consistency and visual depth across every frame. The result is a powerful tool for generating high-resolution motion from simple descriptions.

https://imagen.research.google/video/
2 projects · 2 cities

Related technologies

Recent Talks & Demos

Showing 1-2 of 2

Members-Only

Sign in to see who built these projects