Technology

decoder-only

This architecture is the workhorse of generative Large Language Models (LLMs), specializing in autoregressive text generation via causal (masked) multi-head self-attention.

The decoder-only model is a streamlined Transformer variant, removing the original encoder component to focus exclusively on sequence generation (Source 1.3, 1.5). It operates by predicting the next token based on all preceding tokens in the input sequence, a process called autoregression (Source 1.8). The core mechanism is a stack of decoder blocks, each utilizing masked self-attention to ensure the model cannot ‘look ahead’ at future tokens, maintaining causal integrity (Source 1.2, 1.9). This design powers industry-leading models like OpenAI's GPT series (GPT-3, GPT-4) and Meta's Llama family (Llama-2, Llama-3), making it the standard for tasks requiring fluent, context-aware content creation (Source 1.3, 1.6).

https://machinelearningmastery.com/building-a-decoder-only-transformer-model-like-llama-2-and-llama-3/

1 project · 1 city

Related technologies

entropix 1 layer looping 1 Muon 1 Transformer 11

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Entropix: Compute for Molecular Structure

Milan May 8

decoder-only Transformer