Technology
Wllama
Wllama: The WebAssembly binding for llama.cpp, enabling direct, on-browser LLM inference using CPU power.
Wllama is your solution for running Large Language Models (LLMs) directly in a web browser: no backend or dedicated GPU required. Built on the core `llama.cpp` project, it leverages WebAssembly (Wasm) to execute GGUF-formatted models on the client's CPU, running inference inside a worker to keep your UI responsive. The package offers both high-level APIs (completions, embeddings) and low-level control (KV cache, sampling), and efficiently manages large models by supporting parallel loading of split files, bypassing the 2GB WebAssembly size constraint. It delivers local, secure, and portable LLM capability across a range of devices.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1