.

Technology

LPU architecture

The Language Processing Unit (LPU) architecture, developed by Groq, is an ASIC designed for ultra-low-latency AI inference, leveraging a deterministic, software-first approach.

Groq’s Language Processing Unit (LPU) architecture is a custom-built Application-Specific Integrated Circuit (ASIC) engineered to eliminate latency in AI inference, particularly for Large Language Models (LLMs). The LPU achieves its speed by employing a programmable assembly line architecture and integrating memory directly on-chip (SRAM), bypassing the traditional memory wall bottleneck that slows down GPUs. This design enables deterministic execution, meaning every step is predictable to the clock cycle, which allows for a massive effective memory bandwidth—up to 80 TB/s—and delivers real-world performance like running Llama-2 70B at over 300 tokens per second per user. It is purpose-built for the decode phase of generative AI, ensuring near-instant response times.

https://groq.com/
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects