Technology
Make-A-Video
Meta AI's generative system transforms text prompts into high-definition video clips using a diffusion model trained on millions of data points.
Meta AI researchers designed Make-A-Video to generate five-second video clips from simple text descriptions. The architecture utilizes a Spatio-Temporal Graph Convolutional Network (ST-GCN) trained on the WebVid-10M dataset (10 million video-text pairs) and the HD-VILA-100M dataset. By decoupling motion from appearance, the system learns how objects move from video-only data while maintaining visual fidelity from image-text pairs. Output specifications include a 16-frame-per-second rate at 768x768 resolution. Examples range from a stylized dog wearing a superhero cape to realistic waves crashing on a beach. This technology eliminates the need for manual video editing for short-form creative assets.
Related technologies
Recent Talks & Demos
Showing 1-2 of 2