Technology

VideoCLIP

VideoCLIP uses contrastive learning to align video and text through a transformer-based pre-training strategy that masters zero-shot video-text retrieval.

Developed by researchers at Facebook AI (now Meta), VideoCLIP achieves state-of-the-art performance by training on the HowTo100M dataset (1.2 million narrated videos). It employs a dual-encoder architecture that leverages overlapping video-text clips to learn fine-grained temporal associations. By using an objective that targets both video-to-text and text-to-video alignment, the model excels at zero-shot transfer for tasks like action recognition and video retrieval. It effectively bridges the gap between static image-text models (like CLIP) and dynamic video sequences without requiring manual labels for downstream tasks.

https://arxiv.org/abs/2109.14084

1 project · 1 city

Related technologies

Generate API 1 Imagen Video 2 Make-A-Video 2 Oscar 1 Runway Gen-2 2 Search API 2 UniVL 1 VideoBERT 1 Video embeddings 2

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Twelve Labs: Chatting with Video

New York City Oct 26

Generate API VideoBERT