Technology

GGUF

GGUF (GGerganov's Unified Format) is a memory-mapped, single-file binary format for the efficient, quantized deployment of Large Language Models (LLMs).

GGUF is the definitive file format for the GGML ecosystem (e.g., llama.cpp), engineered for streamlined LLM inference, especially on resource-constrained hardware. It functions as a single, self-contained binary: consolidating all model weights, metadata, and configuration (like tokenizer details) into one file. This design ensures mmap compatibility (memory-mapping) for rapid, lazy-loading of models like Llama, Mistral, and Phi-3. Crucially, GGUF supports a range of advanced blockwise quantization schemes, such as Q4_K and Q6_K, significantly reducing the memory footprint while maintaining performance.

https://github.com/ggerganov/ggml

3 projects · 3 cities

Related technologies

llama 40 Flash Attention 1 Gradio 9 libp2p 2 llama-cpp-agents 1 LLM 91 MLX 7 PyO3 1 Rust 49 WebAssembly 7 Wllama 1

Recent Talks & Demos

Showing 1-3 of 3

Members-Only

OmniNode Protocol: Pipeline-Parallel LLM Inference Across Consumer De…

Los Angeles Mar 19

Rust libp2p

llama-cpp Agents: Local Search

Quito Apr 24

llama llama-cpp-agents

Wllama: LLMs in the Browser

Paris Dec 10

WebAssembly llama