VideoCLIP Projects .

Technology

VideoCLIP

VideoCLIP uses contrastive learning to align video and text through a transformer-based pre-training strategy that masters zero-shot video-text retrieval.

Developed by researchers at Facebook AI (now Meta), VideoCLIP achieves state-of-the-art performance by training on the HowTo100M dataset (1.2 million narrated videos). It employs a dual-encoder architecture that leverages overlapping video-text clips to learn fine-grained temporal associations. By using an objective that targets both video-to-text and text-to-video alignment, the model excels at zero-shot transfer for tasks like action recognition and video retrieval. It effectively bridges the gap between static image-text models (like CLIP) and dynamic video sequences without requiring manual labels for downstream tasks.

https://arxiv.org/abs/2109.14084
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects