Technology

LoRA (peft)

LoRA (Low-Rank Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) method: it drastically reduces trainable parameters by decomposing large weight matrices into smaller, low-rank matrices.

LoRA is the go-to PEFT technique for fine-tuning large models fast. It freezes the original pretrained weights and injects small, low-rank update matrices (A and B) into key layers, typically the attention mechanism. This decomposition slashes the number of trainable parameters: a $1024 \times 1024$ matrix (1 million parameters) can be reduced by over $60\times$ to just $16,384$ parameters using a rank $r=8$ configuration. The result: significantly lower VRAM usage, faster training cycles, and the ability to fine-tune massive models—like Llama-2-7b—on consumer-grade GPUs. Advanced variants like QLoRA and DoRA further optimize this process for performance and efficiency.

https://huggingface.co/docs/peft/main/en/concept_guides/lora_methods

1 project · 1 city

Related technologies

Phi-4-mini 1 PyTorch 264 Transformers 168 Weights & Biases 10

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Phi-4 + FastViT-HD VLM

Seattle Jun 27

Phi-4-mini PyTorch