.

Technology

SqueezeBERT

SqueezeBERT utilizes grouped convolutions to run 4.3x faster than BERT-base on iPhone 8 hardware while maintaining competitive GLUE benchmark accuracy.

Engineered by researchers at Berkeley, SqueezeBERT optimizes the transformer architecture by replacing standard fully-connected layers with grouped convolutions. This structural shift reduces the computational bottleneck of the self-attention mechanism, allowing the model to achieve a 4.3x speedup over BERT-base on mobile processors (specifically the A11 Bionic). It maintains high performance across natural language tasks, scoring a 78.1 on the GLUE benchmark. This makes it a primary choice for developers deploying high-speed NLP pipelines on edge devices and resource-constrained environments.

https://github.com/SqueezeBERT/SqueezeBERT
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects