.

Technology

DistilBERT

DistilBERT is a compact, high-efficiency transformer model: 40% smaller and 60% faster than BERT, while maintaining 97% of its performance on GLUE benchmarks.

DistilBERT is a distilled version of the BERT base model, engineered for high computational efficiency. The architecture is simplified: it cuts the number of transformer layers in half (from 12 to 6) and removes the token-type embeddings and the pooler. This results in a 40% reduction in parameters and a 60% increase in inference speed compared to BERT. The model is trained using knowledge distillation, where a smaller student model learns from the larger BERT teacher model via a triple loss function (language modeling, distillation, and cosine-distance losses). This process allows DistilBERT to retain 97% of BERT's language understanding capabilities, making it ideal for low-latency, resource-constrained, and on-device NLP applications.

https://huggingface.co/distilbert/distilbert-base-uncased
3 projects · 3 cities

Related technologies

Recent Talks & Demos

Showing 1-3 of 3

Members-Only

Sign in to see who built these projects