Technology

Wllama

Wllama: The WebAssembly binding for llama.cpp, enabling direct, on-browser LLM inference using CPU power.

Wllama is your solution for running Large Language Models (LLMs) directly in a web browser: no backend or dedicated GPU required. Built on the core `llama.cpp` project, it leverages WebAssembly (Wasm) to execute GGUF-formatted models on the client's CPU, running inference inside a worker to keep your UI responsive. The package offers both high-level APIs (completions, embeddings) and low-level control (KV cache, sampling), and efficiently manages large models by supporting parallel loading of split files, bypassing the 2GB WebAssembly size constraint. It delivers local, secure, and portable LLM capability across a range of devices.

https://github.com/ngxson/wllama

1 project · 1 city

Related technologies

GGUF 7 llama 136 LLM 401 WebAssembly 8

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Wllama: LLMs in the Browser

Paris Dec 10

WebAssembly llama