.

Technology

Transformers

The deep learning architecture that revolutionized sequence modeling (NLP, vision) by replacing recurrent units with a parallelizable multi-head self-attention mechanism.

The Transformer: a neural network architecture introduced in the landmark 2017 paper, "Attention Is All You Need." It eliminated the sequential processing bottleneck of prior Recurrent Neural Networks (RNNs) by relying solely on self-attention, enabling massive parallelization and significantly faster training (up to 10x faster) on modern hardware. This efficiency allowed for the creation of large-scale pre-trained models: BERT (encoder-only) and the generative GPT series (decoder-only). The architecture is now foundational to all modern Large Language Models (LLMs) and drives the current state-of-the-art in AI.

https://doi.org/10.48550/arXiv.1706.03762
168 projects · 57 cities

Related technologies

Recent Talks & Demos

Showing 121-144 of 168

Members-Only

Sign in to see who built these projects