.

Technology

MiniLM

Microsoft's distilled transformer architecture that retains 99% of BERT performance at 50% of the size.

MiniLM (Small Language Model) utilizes deep self-attention distillation to compress large-scale transformers like BERT and RoBERTa into efficient, production-ready models. By mimicking the self-attention distributions and value-relation transfers of a teacher model, MiniLM-L12-v2 achieves high accuracy across GLUE benchmarks while significantly reducing latency. It is an ideal choice for edge deployment and real-time NLP tasks where CPU overhead must remain low without sacrificing linguistic nuance.

https://github.com/microsoft/unilm/tree/master/minilm
4 projects · 4 cities

Related technologies

Recent Talks & Demos

Showing 1-4 of 4

Members-Only

Sign in to see who built these projects