Technology

Zebra-Llama

Zebra-Llama is an efficient hybrid LLM architecture (SSM + MLA) delivering Transformer-level accuracy with near-SSM inference speed and dramatically reduced KV cache memory.

Zebra-Llama is a family of 1B, 3B, and 8B hybrid language models: it merges State Space Models (SSMs) and Multi-head Latent Attention (MLA) for maximum efficiency. This approach achieves Transformer-level accuracy while significantly cutting resource demands (e.g., reducing the KV cache size to as low as 3.9% of the original for the 1B variant). The model uses only 7–11 billion training tokens, a fraction of the trillions required for full pre-training, providing a practical, scalable route for building high-performance, efficient LLMs from existing pre-trained models.

https://github.com/AMD-AGI/AMD-Hybrid-Models

1 project · 1 city

Related technologies

Llama-3-8B-Instruct 2 LoRA 26 RAG 254 Streamlit 84

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Zebra-Llama: AI Rare Disease Expert

Toronto Sep 20

Llama-3-8B-Instruct LoRA