.

Technology

TinyLlama 1

A compact 1.1B parameter Llama-2 model trained on 3 trillion tokens for high-performance edge deployment.

TinyLlama 1 delivers a lightweight 1.1B parameter architecture optimized for mobile and embedded hardware. The team leveraged 950 A100 GPUs to process 3 trillion tokens over 90 days, ensuring the model maximizes the scaling laws typically reserved for larger systems. It uses a standard Llama-2 structure (FlashAttention, RMSNorm, and SwiGLU) to maintain compatibility with existing open-source ecosystems. This specific footprint allows for 4-bit quantization under 1GB of VRAM, making it a primary choice for real-time local inference and device-specific fine-tuning.

https://github.com/jzhang38/TinyLlama
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects