Technology

Imagen Video

Google Research's high-definition video generation system uses cascaded diffusion models to transform text prompts into 1280x768 resolution clips.

Imagen Video converts text into high-fidelity video through a sequence of seven cascaded diffusion models. The process begins with a base 64x40 resolution model (3 fps) and scales up to 1280x768 at 24 fps using spatial and temporal super-resolution. It leverages the T5-XXL text encoder (4.6 billion parameters) to ensure precise alignment with complex prompts: including specific artistic styles like oil paintings or 3D renders. This architecture maintains temporal consistency and visual depth across every frame. The result is a powerful tool for generating high-resolution motion from simple descriptions.

https://imagen.research.google/video/

2 projects · 2 cities

Related technologies

Make-A-Video 2 Runway Gen-2 2 AR platforms 1 DALL-E 6 D-ID Creative Reality Studio 1 ElevenLabs 36 Generate API 1 GPT-4 528 Oscar 1 Search API 2 Stable Video Diffusion 1 Synthesia 1 UniVL 1 VideoBERT 1 VideoCLIP 1 Video embeddings 2

Recent Talks & Demos

Showing 1-2 of 2

Members-Only

KathaBook

Toronto May 30

GPT-4 DALL-E

Twelve Labs: Chatting with Video

New York City Oct 26

Generate API VideoBERT