Technology

GPT-2

GPT-2 is a 1.5 billion-parameter, transformer-based language model from OpenAI (2019), trained on 40GB of internet text (WebText) to predict the next word, demonstrating strong zero-shot performance across diverse tasks.

GPT-2 (Generative Pre-trained Transformer 2) is a large, transformer-based language model developed by OpenAI, first released in a staged manner starting in 2019. The largest version features 1.5 billion parameters and was trained on a massive 40GB dataset called WebText, sourced from 8 million web pages. Its core objective was simple: predict the next word in a sequence. This simple training goal resulted in a powerful model capable of generating high-quality, coherent conditional synthetic text. Critically, GPT-2 demonstrated a remarkable ability to perform multiple downstream tasks—including summarization, translation, and question answering—in a 'zero-shot' setting, meaning it required no task-specific training data to achieve state-of-the-art results at the time.