Technology

Video Understanding

Video Understanding is the computer vision discipline that uses multimodal AI (e.g., VideoPrism) to interpret the temporal and spatial dynamics of video: extracting objects, actions, and context across frames.

This technology is a critical component of modern computer vision, moving beyond static image recognition to grasp the 'story' unfolding in a video clip. It leverages advanced deep learning models, like Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), to analyze both spatial features (objects) and temporal features (motion, speed). Key applications include autonomous driving, where systems predict pedestrian behavior, and security surveillance, where it detects suspicious activity in real-time. For instance, foundation models like Google Research's VideoPrism achieve state-of-the-art results on 30 out of 33 benchmarks by training on massive datasets, including 36 million high-quality video-text pairs: that’s the scale required to truly understand video at a global level.

https://twelvelabs.io/

3 projects · 3 cities

Related technologies

BERT 179 BLOOM 115 GPT-3 191 GPT-4 528 Llama-2 227 PaLM 2 116 RoBERTa 118 ABBYY FineReader 3 Agentic 2 Amazon Textract 5 Claude Code 172 Cloud Vision API 3 clustering 3 EasyOCR 2 Embeddings 22 Foundation models 5 Gemini 3 8 KMeans 2

Recent Talks & Demos

Showing 1-3 of 3

Members-Only

Detect Issues, Fix with Agents

Prague Feb 26

Gemini 3 Temporal

Video Foundation Models: Video First

Denver Nov 22

Foundation models GPT-4

Augmend: ML Video Documentation

Seattle Aug 8

Tesseract Speech Recognition