Technology

Jan-nano-gguf

Jan-nano-gguf is a high-performance, 4-bit quantized Llama-3 model optimized for local inference on consumer-grade hardware.

Engineered by the Jan team, this GGUF implementation leverages 8 billion parameters to deliver low-latency AI responses directly on your desktop. It utilizes advanced quantization techniques to reduce memory overhead (VRAM) while maintaining 95% of the original model's accuracy. By integrating seamlessly with the Jan desktop app and llama.cpp, it enables private, offline execution for developers and researchers requiring high-speed text generation without cloud dependencies.

https://huggingface.co/jan-hq/Jan-Llama-3-8B-Instruct-GGUF

1 project · 1 city

Related technologies

AI agents 35 dice rolls 1 Docker 128 granite-embedding 278m 1 granite-embedding 278m for RAG 1 monsters 1 Qwen-2 3 RAG 138 Scratch 2

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Compose and Dragons: Tiny LLMs

Paris Apr 21

Docker Jan-nano-gguf