Technology
Flamingo
A premier vision-language model (VLM) built by DeepMind for rapid multimodal task adaptation.
Flamingo integrates vision encoders with large language models (such as the 70B Chinchilla) using a Perceiver Resampler. This setup enables the model to ingest interleaved sequences of text, images, and video. It dominates in few-shot scenarios: it often surpasses fine-tuned specialists using only 32 task-specific examples. Use it to solve visual question answering (VQA) or image captioning challenges with minimal data overhead.
3 projects
·
3 cities
Related technologies
Recent Talks & Demos
Showing 1-3 of 3