Technology

DistilBERT

DistilBERT is a compact, high-efficiency transformer model: 40% smaller and 60% faster than BERT, while maintaining 97% of its performance on GLUE benchmarks.

DistilBERT is a distilled version of the BERT base model, engineered for high computational efficiency. The architecture is simplified: it cuts the number of transformer layers in half (from 12 to 6) and removes the token-type embeddings and the pooler. This results in a 40% reduction in parameters and a 60% increase in inference speed compared to BERT. The model is trained using knowledge distillation, where a smaller student model learns from the larger BERT teacher model via a triple loss function (language modeling, distillation, and cosine-distance losses). This process allows DistilBERT to retain 97% of BERT's language understanding capabilities, making it ideal for low-latency, resource-constrained, and on-device NLP applications.

https://huggingface.co/distilbert/distilbert-base-uncased

2 projects · 2 cities

Related technologies

alBERT 4 LiteLLM 18 MiniLM 1 Mistral Small 1 MobileBERT 1 On-device AI 3 sentence-transformers 11 SqueezeBERT 1 TinyBERT 1 Transformers 146

Recent Talks & Demos

Showing 1-2 of 2

Members-Only

SafeGuide: Offline Emergency Guidance

Tokyo Jan 15

DistilBERT On-device AI

fastWorkflow: Deterministic Conversational AI

Houston Sep 9

LiteLLM Transformers