Technology
ELECTRA
ELECTRA: A highly compute-efficient NLP pre-training model using a discriminator-based approach to detect replaced tokens, significantly outperforming Masked Language Models (MLMs) on compute budget.
ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) is a Google Research NLP model that shifts pre-training from generative (like BERT's masked language modeling) to discriminative: it trains a model to identify 'real' versus 'fake' tokens in the input, a method called Replaced Token Detection (RTD). This approach is highly efficient because the discriminator learns from all input tokens, not just the 15% masked subset used in MLMs. For example, ELECTRA-Small, trained on one GPU for just four days, surpasses GPT, a model requiring over 30x more compute. At scale, ELECTRA achieves state-of-the-art results on benchmarks like SQuAD 2.0 while using less than 1/4 of the compute required by comparable models like RoBERTa and XLNet.
Related technologies
Recent Talks & Demos
Showing 1-2 of 2