Technology
SqueezeBERT
SqueezeBERT utilizes grouped convolutions to run 4.3x faster than BERT-base on iPhone 8 hardware while maintaining competitive GLUE benchmark accuracy.
Engineered by researchers at Berkeley, SqueezeBERT optimizes the transformer architecture by replacing standard fully-connected layers with grouped convolutions. This structural shift reduces the computational bottleneck of the self-attention mechanism, allowing the model to achieve a 4.3x speedup over BERT-base on mobile processors (specifically the A11 Bionic). It maintains high performance across natural language tasks, scoring a 78.1 on the GLUE benchmark. This makes it a primary choice for developers deploying high-speed NLP pipelines on edge devices and resource-constrained environments.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1