Technology
LoRA (peft)
LoRA (Low-Rank Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) method: it drastically reduces trainable parameters by decomposing large weight matrices into smaller, low-rank matrices.
LoRA is the go-to PEFT technique for fine-tuning large models fast. It freezes the original pretrained weights and injects small, low-rank update matrices (A and B) into key layers, typically the attention mechanism. This decomposition slashes the number of trainable parameters: a $1024 \times 1024$ matrix (1 million parameters) can be reduced by over $60\times$ to just $16,384$ parameters using a rank $r=8$ configuration. The result: significantly lower VRAM usage, faster training cycles, and the ability to fine-tune massive models—like Llama-2-7b—on consumer-grade GPUs. Advanced variants like QLoRA and DoRA further optimize this process for performance and efficiency.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1