Technology
Agent Attention
Agent Attention is a novel Transformer attention mechanism: it integrates Softmax and Linear Attention via agent tokens, achieving high expressiveness with linear computational efficiency.
This technology directly addresses the efficiency-expressiveness trade-off in Transformers. Agent Attention introduces a small set of agent tokens (A) to the conventional attention module (Q, K, V). These agents aggregate global information, then broadcast it back to the query tokens (Q), effectively combining the power of Softmax attention with the low computational cost of linear attention. The result is a highly efficient, plug-in module: it delivers up to a 2.2x acceleration in image generation for models like Stable Diffusion while preserving high image quality across diverse vision tasks.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1