.

Technology

GGUF

GGUF (GGerganov's Unified Format) is a memory-mapped, single-file binary format for the efficient, quantized deployment of Large Language Models (LLMs).

GGUF is the definitive file format for the GGML ecosystem (e.g., llama.cpp), engineered for streamlined LLM inference, especially on resource-constrained hardware. It functions as a single, self-contained binary: consolidating all model weights, metadata, and configuration (like tokenizer details) into one file. This design ensures mmap compatibility (memory-mapping) for rapid, lazy-loading of models like Llama, Mistral, and Phi-3. Crucially, GGUF supports a range of advanced blockwise quantization schemes, such as Q4_K and Q6_K, significantly reducing the memory footprint while maintaining performance.

https://github.com/ggerganov/ggml
7 projects · 7 cities

Related technologies

Recent Talks & Demos

Showing 1-7 of 7

Members-Only

Sign in to see who built these projects