Technology
MiniLM
Microsoft's distilled transformer architecture that retains 99% of BERT performance at 50% of the size.
MiniLM (Small Language Model) utilizes deep self-attention distillation to compress large-scale transformers like BERT and RoBERTa into efficient, production-ready models. By mimicking the self-attention distributions and value-relation transfers of a teacher model, MiniLM-L12-v2 achieves high accuracy across GLUE benchmarks while significantly reducing latency. It is an ideal choice for edge deployment and real-time NLP tasks where CPU overhead must remain low without sacrificing linguistic nuance.
4 projects
·
4 cities
Related technologies
Recent Talks & Demos
Showing 1-4 of 4