Technology
TinyBERT
TinyBERT uses a two-stage transformer distillation process to compress BERT models by 7.5x while retaining 96% performance.
Engineered by Huawei Noah’s Ark Lab, TinyBERT bridges the gap between massive language models and edge device deployment. It employs a novel Transformer-distillation method that targets knowledge transfer at the embedding, hidden state, and self-attention layers. By reducing parameters from 110 million to 14.5 million, the model achieves inference speeds up to 9.4x faster than BERT-base. This efficiency makes it a standard choice for real-time NLP tasks on mobile hardware (ARM/Android) without sacrificing the accuracy required for GLUE benchmark standards.
2 projects
·
2 cities
Related technologies
Recent Talks & Demos
Showing 1-2 of 2