Technology

Groq LPU

The Groq LPU (Language Processing Unit) is the world's fastest AI inference chip, purpose-built for low-latency, high-throughput generative AI and Large Language Models (LLMs).

The Groq LPU is a next-generation processor built on a Tensor-Streaming Processor (TSP) architecture: it is not a GPU. This single-core design eliminates bottlenecks by integrating hundreds of megabytes of SRAM directly on-chip, bypassing traditional off-chip memory latency. A custom compiler ensures deterministic execution, which translates to predictable, record-setting performance: for example, achieving over 300 tokens per second on Llama 2 70B and 480 tokens per second on Mixtral 8x7B. This specialized approach makes the LPU up to 10x more energy efficient than conventional GPUs for LLM inference, delivering speed and cost efficiency at scale.

https://groq.com

2 projects · 2 cities

Related technologies

AI 868 GitHub 152 Groq 32 GroqCloud 3 LPU architecture 1 ScribeWizard 1 StockBot 1

Recent Talks & Demos

Showing 1-2 of 2

Members-Only

Groq: ScribeWizard and StockBot

Amsterdam Sep 25

GroqCloud Groq LPU

Groq LPU Inference Demos

Seattle Aug 15

Groq LPU Groq