Technology

GGML

GGML is a high-performance C tensor library: it enables large language model (LLM) inference on standard consumer hardware via efficient integer quantization.

GGML (Gerganov's General Machine Learning) is a foundational C library for machine learning, specifically engineered for efficient Transformer inference. The core innovation is its low-level, cross-platform design and integer quantization support, which significantly reduces the memory footprint of large models. This allows models like OpenAI's Whisper and LLaMA to run on CPUs and consumer-grade GPUs, a critical shift for edge AI deployment. It offers various quantization strategies (e.g., 4-bit, 5-bit, 8-bit) and operates with zero third-party dependencies and zero runtime memory allocations, ensuring maximum performance and reliability in projects like `llama.cpp` and `whisper.cpp`.

https://ggml.ai

1 project · 1 city

Related technologies

llama 40 ONNX 82 ONNX Runtime 4 Whisper 25

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

GGML ONNX Runtime

Toronto Apr 11

GGML llama