Technology
Groq LPU
The Groq LPU (Language Processing Unit) is the world's fastest AI inference chip, purpose-built for low-latency, high-throughput generative AI and Large Language Models (LLMs).
The Groq LPU is a next-generation processor built on a Tensor-Streaming Processor (TSP) architecture: it is not a GPU. This single-core design eliminates bottlenecks by integrating hundreds of megabytes of SRAM directly on-chip, bypassing traditional off-chip memory latency. A custom compiler ensures deterministic execution, which translates to predictable, record-setting performance: for example, achieving over 300 tokens per second on Llama 2 70B and 480 tokens per second on Mixtral 8x7B. This specialized approach makes the LPU up to 10x more energy efficient than conventional GPUs for LLM inference, delivering speed and cost efficiency at scale.
Related technologies
Recent Talks & Demos
Showing 1-2 of 2