Sparse sync Projects .

Technology

Sparse sync

SparseSync is a transformer-based framework designed to synchronize audio and visual streams in unconstrained videos where alignment cues are infrequent or spatially small.

SparseSync tackles the challenge of audio-visual synchronization in 'in-the-wild' videos where cues are intermittent, such as a single dog bark or a distant wood chop. Unlike traditional models optimized for dense signals like talking heads, SparseSync uses a SparseSelector architecture to compress long temporal sequences into a manageable set of learnable tokens. This approach reduces computational complexity from quadratic to linear, allowing the model to process high-resolution, long-duration clips without sacrificing accuracy. By training on the VGGSound-Sparse dataset, the system achieves state-of-the-art performance in predicting precise temporal offsets even when the synchronization signal is sparse in both space and time.

https://github.com/v-iashin/SparseSync
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects