Technology

Agent Attention

Agent Attention is a novel Transformer attention mechanism: it integrates Softmax and Linear Attention via agent tokens, achieving high expressiveness with linear computational efficiency.

This technology directly addresses the efficiency-expressiveness trade-off in Transformers. Agent Attention introduces a small set of agent tokens (A) to the conventional attention module (Q, K, V). These agents aggregate global information, then broadcast it back to the query tokens (Q), effectively combining the power of Softmax attention with the low computational cost of linear attention. The result is a highly efficient, plug-in module: it delivers up to a 2.2x acceleration in image generation for models like Stable Diffusion while preserving high image quality across diverse vision tasks.

https://github.com/LeapLabTHU/Agent-Attention

1 project · 1 city

Related technologies

Google Colab 12 GPT-4 528 Hugging Face 37 Mamba-X 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Mamba: Agent Attention, Sparse FFN

Los Angeles Jan 10

Mamba-X GPT-4