.

Technology

Flamingo

A premier vision-language model (VLM) built by DeepMind for rapid multimodal task adaptation.

Flamingo integrates vision encoders with large language models (such as the 70B Chinchilla) using a Perceiver Resampler. This setup enables the model to ingest interleaved sequences of text, images, and video. It dominates in few-shot scenarios: it often surpasses fine-tuned specialists using only 32 task-specific examples. Use it to solve visual question answering (VQA) or image captioning challenges with minimal data overhead.

https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
3 projects · 3 cities

Related technologies

Recent Talks & Demos

Showing 1-3 of 3

Members-Only

Sign in to see who built these projects