Technology

Transformer models

A neural network architecture leveraging the self-attention mechanism for parallel processing of sequential data, enabling the development of foundational models (BERT, GPT) that power modern AI.

The Transformer is the foundational neural network architecture for modern AI, introduced in the 2017 paper “Attention Is All You Need.” Its core innovation is the self-attention mechanism: this allows the model to weigh the importance of every input token relative to every other token simultaneously, eliminating the sequential bottlenecks of older Recurrent Neural Networks (RNNs). This parallelization drastically accelerated training on GPUs and TPUs. The architecture underpins all major Large Language Models (LLMs), including the GPT series and BERT, driving state-of-the-art performance across machine translation, text generation, and even computer vision (Vision Transformers, ViT).

https://huggingface.co/docs/transformers/index

1 project · 1 city

Related technologies

bfloat16 1 INT8 Tensor Cores 1 NVIDIA A100 1 NVIDIA GeForce RTX 4090 2

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

NVIDIA INT8 ML Training

Singapore Nov 19

INT8 Tensor Cores NVIDIA GeForce RTX 4090