Technology

llama.cpp

The definitive C/C++ library for high-performance LLM inference: run quantized GGUF models on commodity hardware, CPU-first.

Llama.cpp is the C/C++ engine for highly efficient Large Language Model (LLM) inference, built on the GGML tensor library. It champions a CPU-first philosophy, enabling models up to 70B parameters to run locally on diverse hardware (laptops, Raspberry Pi) with minimal dependencies. This efficiency is driven by the GGUF file format and advanced quantization techniques (1.5-bit to 8-bit), allowing 7B+ parameter models to operate comfortably with just 4–8GB of RAM. The project provides a fast, hackable framework, including command-line tools (`llama-cli`) and an OpenAI-compatible web server (`llama-server`), ensuring state-of-the-art performance across various backends, including robust GPU support.

https://github.com/ggml-org/llama.cpp

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.