.

Technology

llama.cpp

The definitive C/C++ library for high-performance LLM inference: run quantized GGUF models on commodity hardware, CPU-first.

Llama.cpp is the C/C++ engine for highly efficient Large Language Model (LLM) inference, built on the GGML tensor library. It champions a CPU-first philosophy, enabling models up to 70B parameters to run locally on diverse hardware (laptops, Raspberry Pi) with minimal dependencies. This efficiency is driven by the GGUF file format and advanced quantization techniques (1.5-bit to 8-bit), allowing 7B+ parameter models to operate comfortably with just 4โ€“8GB of RAM. The project provides a fast, hackable framework, including command-line tools (`llama-cli`) and an OpenAI-compatible web server (`llama-server`), ensuring state-of-the-art performance across various backends, including robust GPU support.

https://github.com/ggml-org/llama.cpp
12 projects ยท 15 cities

Related technologies

Recent Talks & Demos

Showing 1-12 of 12

Members-Only

Sign in to see who built these projects