Technology

LoRA Finetuning

LoRA (Low-Rank Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) method: it freezes a large model's weights and injects small, trainable rank decomposition matrices, cutting training parameters by up to 10,000x.

LoRA is the definitive strategy for adapting massive foundation models (e.g., GPT-3 175B) without the prohibitive cost of full fine-tuning. The mechanism is efficient: freeze the original pre-trained weights, then inject two small, low-rank matrices (A and B) into the Transformer's attention layers. This injection drastically reduces the number of trainable parameters (up to 10,000x fewer parameters) and cuts GPU memory requirements by a factor of 3. LoRA maintains or exceeds full fine-tuning performance on models like RoBERTa and DeBERTa, delivering superior resource efficiency without adding inference latency: it is the industry standard for domain adaptation.

https://arxiv.org/abs/2106.09685

1 project · 1 city

Related technologies

BERT 179 BLIP 4 BLIP-2 3 BLOOM 115 CLIP 10 Fireworks AI 1 Flamingo 3 GPT-3 191 GPT-4 528 Llama-2 227 LXMERT 4 PaLM 2 116 Python 618 RoBERTa 118 UNITER 3 ViLBERT 4 VisualBERT 3

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

classifai.dev: Self-Improving Classification

Los Angeles Oct 20

GPT-4 CLIP