Technology

MiniLM

Microsoft's distilled transformer architecture that retains 99% of BERT performance at 50% of the size.

MiniLM (Small Language Model) utilizes deep self-attention distillation to compress large-scale transformers like BERT and RoBERTa into efficient, production-ready models. By mimicking the self-attention distributions and value-relation transfers of a teacher model, MiniLM-L12-v2 achieves high accuracy across GLUE benchmarks while significantly reducing latency. It is an ideal choice for edge deployment and real-time NLP tasks where CPU overhead must remain low without sacrificing linguistic nuance.

https://github.com/microsoft/unilm/tree/master/minilm

1 project · 1 city

Related technologies

alBERT 4 DistilBERT 2 MobileBERT 1 On-device AI 3 SqueezeBERT 1 TinyBERT 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

SafeGuide: Offline Emergency Guidance

Tokyo Jan 15

DistilBERT On-device AI