Technology
CLIP
CLIP (Contrastive Language–Image Pre-training) is an OpenAI neural network that connects visual and textual data for powerful zero-shot image classification.
CLIP, developed by OpenAI, is a multimodal AI model: it learns visual concepts directly from natural language supervision. The system trains a text encoder and an image encoder to predict which text-image pairs match within a massive dataset (400 million pairs). This contrastive pre-training eliminates the need for expensive, manually labeled datasets like ImageNet. The key capability is zero-shot transfer: the model can classify an image into any category, such as 'a photo of a vintage motorcycle,' without explicit, task-specific training data.
Related technologies
Recent Talks & Demos
Showing 1-16 of 16