.

Technology

LoRA Finetuning

LoRA (Low-Rank Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) method: it freezes a large model's weights and injects small, trainable rank decomposition matrices, cutting training parameters by up to 10,000x.

LoRA is the definitive strategy for adapting massive foundation models (e.g., GPT-3 175B) without the prohibitive cost of full fine-tuning. The mechanism is efficient: freeze the original pre-trained weights, then inject two small, low-rank matrices (A and B) into the Transformer's attention layers. This injection drastically reduces the number of trainable parameters (up to 10,000x fewer parameters) and cuts GPU memory requirements by a factor of 3. LoRA maintains or exceeds full fine-tuning performance on models like RoBERTa and DeBERTa, delivering superior resource efficiency without adding inference latency: it is the industry standard for domain adaptation.

https://arxiv.org/abs/2106.09685
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects