Technology

State Space Model

State Space Models (SSMs) provide a linear-scaling alternative to Transformers by modeling sequences through continuous differential equations.

SSMs like Mamba solve the quadratic bottleneck of the Attention mechanism (O(n²)) by using a recurrent formulation that processes sequences in linear time (O(n)). By mapping input signals through a hidden state via matrices A and B, these models maintain a constant-size memory state regardless of context length. This architecture enables 5x higher throughput than Llama-class models on long-form tasks while matching performance on benchmarks like Pile or WikiText-103. It is the go-to choice for hardware-efficient sequence modeling in genomics, audio processing, and long-context AI.

https://arxiv.org/abs/2312.00752

1 project · 1 city

Related technologies

DeepSpeed 2 Mamba 1 SlimPajama 1 Transformers 146

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Mamba Long Context Training

Austin Apr 11

Mamba DeepSpeed